2023-12-05 19:34:41

by Geert Uytterhoeven

[permalink] [raw]
Subject: [PATCH] checkpatch: Also accept commit ids with 13-40 chars of sha1

Documentation/dev-tools/checkpatch.rst says:

**GIT_COMMIT_ID**
The proper way to reference a commit id is:
commit <12+ chars of sha1> ("<title line>")

However, scripts/checkpatch.pl has two different checks: one warning
check accepting 12 characters exactly:

# Check Fixes: styles is correct
Please use correct Fixes: style 'Fixes: <12 chars of sha1> (\"<title line>\")'

and a second error check accepting 12-40 characters:

# Check for git id commit length and improperly formed commit descriptions
# A correctly formed commit description is:
# commit <SHA-1 hash length 12+ chars> ("Complete commit subject")
Please use git commit description style 'commit <12+ chars of sha1>

Hence patches containing commit ids with more than 12 characters are
flagged by checkpatch, and sometimes rejected by maintainers or
reviewers.

Fix this by aligning the first check with the second check, and with the
documentation.

Fixes: bd17e036b495bebb ("checkpatch: warn for non-standard fixes tag style")
Signed-off-by: Geert Uytterhoeven <[email protected]>
---
Perhaps the time is ripe to increase the minimum from 12 to 16 chars
(in a follow-up patch)?

Running git-unique-abbrev[1] on a tree containing v6.7-rc3 and all
stable releases gives:

12000853 objects
4: 12000853 / 65536
5: 12000717 / 1048423
6: 6130888 / 2703295
7: 525025 / 260563
8: 33736 / 16861
9: 2106 / 1053
10: 160 / 80
11: 10 / 5
12: 0 / 0
21cf4d54d3c702ac20c6747fa6d4f64dee07dd11
21cf4d54d3ced8a3e752030e483d72997721076d
8a048bbf89528d45c604aed68f7e0f0ef957067d
8a048bbf895b1359e4a33b779ea6d7386cfe4de2
d3ac4e475103c4364ecb47a6a55c114d7c42a014
d3ac4e47510ec0753ebe1e418a334ad202784aa8
d597639e2036f04f0226761e2d818b31f2db7820
d597639e203a100156501df8a0756fd09573e2de
ef91b6e893a00d903400f8e1303efc4d52b710af
ef91b6e893afc4c4ca488453ea9f19ced5fa5861

12000853 is still smaller than sqrt(16^12) = 16777216, but the safety
margin is getting smaller. E.g. my main work tree already contains
almost 18M objects. Hence the Birthday Paradox states that collisions
of 12 char sha1 values are imminent.

Note that we standardized on 12 chars in commit d311cd44545f2f69
("checkpatch: add test for commit id formatting style in commit log") in
v3.17. For comparison, running git-unique-abbrev on a tree with all
(upstream + stable) releases from that era gives:

4052307 objects
4: 4052307 / 65536
5: 3966948 / 940963
6: 869691 / 417363
7: 61208 / 30523
8: 3979 / 1989
9: 258 / 129
10: 24 / 12
11: 6 / 3
12: 0 / 0
21cf4d54d3c702ac20c6747fa6d4f64dee07dd11
21cf4d54d3ced8a3e752030e483d72997721076d
d597639e2036f04f0226761e2d818b31f2db7820
d597639e203a100156501df8a0756fd09573e2de
ef91b6e893a00d903400f8e1303efc4d52b710af
ef91b6e893afc4c4ca488453ea9f19ced5fa5861

So the number of objects increased threefold during the last 9 years.

Thanks for your comments!

[1] https://blog.cuviper.com/2013/11/10/how-short-can-git-abbreviate/
---
scripts/checkpatch.pl | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 25fdb7fda1128aa9..a4e178a68f6d1d5f 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -3209,7 +3209,7 @@ sub process {
$tag_case = 0 if $tag eq "Fixes:";
$tag_space = 0 if ($line =~ /^fixes:? [0-9a-f]{5,} ($balanced_parens)/i);

- $id_length = 0 if ($orig_commit =~ /^[0-9a-f]{12}$/i);
+ $id_length = 0 if ($orig_commit =~ /^[0-9a-f]{12,40}$/i);
$id_case = 0 if ($orig_commit !~ /[A-F]/);

# Always strip leading/trailing parens then double quotes if existing
@@ -3226,7 +3226,7 @@ sub process {
if ($ctitle ne $title || $tag_case || $tag_space ||
$id_length || $id_case || !$title_has_quotes) {
if (WARN("BAD_FIXES_TAG",
- "Please use correct Fixes: style 'Fixes: <12 chars of sha1> (\"<title line>\")' - ie: 'Fixes: $cid (\"$ctitle\")'\n" . $herecurr) &&
+ "Please use correct Fixes: style 'Fixes: <12+ chars of sha1> (\"<title line>\")' - ie: 'Fixes: $cid (\"$ctitle\")'\n" . $herecurr) &&
$fix) {
$fixed[$fixlinenr] = "Fixes: $cid (\"$ctitle\")";
}
--
2.34.1


2023-12-05 20:06:03

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH] checkpatch: Also accept commit ids with 13-40 chars of sha1

On Tue, 2023-12-05 at 20:34 +0100, Geert Uytterhoeven wrote:
> Documentation/dev-tools/checkpatch.rst says:
>
> **GIT_COMMIT_ID**
> The proper way to reference a commit id is:
> commit <12+ chars of sha1> ("<title line>")

It's not just checkpatch.

Documentation/process/submitting-patches.rst:``git bisect``, please use the 'Fixes:' tag with the first 12 characters of


So that would need to be updated as well.

And 12 still has quite some headroom.

$ git rev-list --all --abbrev=0 --abbrev-commit | \
awk '{ a[length] += 1 } END { for (len in a) print len, a[len] }'
5 107
6 684276
7 505734
8 41769
9 2665
10 174
11 8


2023-12-06 07:46:27

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH] checkpatch: Also accept commit ids with 13-40 chars of sha1

Hi Joe,

On Tue, Dec 5, 2023 at 9:05 PM Joe Perches <[email protected]> wrote:
> On Tue, 2023-12-05 at 20:34 +0100, Geert Uytterhoeven wrote:
> > Documentation/dev-tools/checkpatch.rst says:
> >
> > **GIT_COMMIT_ID**
> > The proper way to reference a commit id is:
> > commit <12+ chars of sha1> ("<title line>")
>
> It's not just checkpatch.
>
> Documentation/process/submitting-patches.rst:``git bisect``, please use the 'Fixes:' tag with the first 12 characters of
>
>
> So that would need to be updated as well.

And:

Documentation/process/maintainer-tip.rst: - Fixes: 12char-SHA1
("sub/sys: Original subject line")

but the example uses 15:

Documentation/process/maintainer-tip.rst: Fixes: abcdef012345678
("x86/xxx: Replace foo with bar")

Documentation/process/researcher-guidelines.rst: Fixes:
aaaabbbbccccdddd ("Introduce support for FooBar")

16

Documentation/process/submitting-patches.rst: Commit
e21d2170f36602ae2708 ("video: remove unnecessary

20

> And 12 still has quite some headroom.
>
> $ git rev-list --all --abbrev=0 --abbrev-commit | \
> awk '{ a[length] += 1 } END { for (len in a) print len, a[len] }'
> 5 107
> 6 684276
> 7 505734
> 8 41769
> 9 2665
> 10 174
> 11 8

How many collisions do you need? These will be dereferenced years
from now.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2023-12-10 21:57:13

by Niklas Söderlund

[permalink] [raw]
Subject: Re: [PATCH] checkpatch: Also accept commit ids with 13-40 chars of sha1

Hi Geert,

Thanks for your patch.

On 2023-12-05 20:34:16 +0100, Geert Uytterhoeven wrote:
> Documentation/dev-tools/checkpatch.rst says:
>
> **GIT_COMMIT_ID**
> The proper way to reference a commit id is:
> commit <12+ chars of sha1> ("<title line>")
>
> However, scripts/checkpatch.pl has two different checks: one warning
> check accepting 12 characters exactly:
>
> # Check Fixes: styles is correct
> Please use correct Fixes: style 'Fixes: <12 chars of sha1> (\"<title line>\")'
>
> and a second error check accepting 12-40 characters:
>
> # Check for git id commit length and improperly formed commit descriptions
> # A correctly formed commit description is:
> # commit <SHA-1 hash length 12+ chars> ("Complete commit subject")
> Please use git commit description style 'commit <12+ chars of sha1>
>
> Hence patches containing commit ids with more than 12 characters are
> flagged by checkpatch, and sometimes rejected by maintainers or
> reviewers.

I agree, it's not nice that the two commit id checks do not agree on
length and that this should likely be aligned.

To clarify, the two commit id checks in checkpatch.pl do not conflict
with each other as one check Fixes tags while the other checks the
format when referring to commits in general, right?

The intention when adding the check for Fixes tags was to conform with
what is documented in [1]. And to enforce a minimum number of characters
as there where issues where too few where used.

>
> Fix this by aligning the first check with the second check, and with the
> documentation.

I think this change is a good idea, but the documentation should also be
updated.

1. https://www.kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes

>
> Fixes: bd17e036b495bebb ("checkpatch: warn for non-standard fixes tag style")
> Signed-off-by: Geert Uytterhoeven <[email protected]>
> ---
> Perhaps the time is ripe to increase the minimum from 12 to 16 chars
> (in a follow-up patch)?
>
> Running git-unique-abbrev[1] on a tree containing v6.7-rc3 and all
> stable releases gives:
>
> 12000853 objects
> 4: 12000853 / 65536
> 5: 12000717 / 1048423
> 6: 6130888 / 2703295
> 7: 525025 / 260563
> 8: 33736 / 16861
> 9: 2106 / 1053
> 10: 160 / 80
> 11: 10 / 5
> 12: 0 / 0
> 21cf4d54d3c702ac20c6747fa6d4f64dee07dd11
> 21cf4d54d3ced8a3e752030e483d72997721076d
> 8a048bbf89528d45c604aed68f7e0f0ef957067d
> 8a048bbf895b1359e4a33b779ea6d7386cfe4de2
> d3ac4e475103c4364ecb47a6a55c114d7c42a014
> d3ac4e47510ec0753ebe1e418a334ad202784aa8
> d597639e2036f04f0226761e2d818b31f2db7820
> d597639e203a100156501df8a0756fd09573e2de
> ef91b6e893a00d903400f8e1303efc4d52b710af
> ef91b6e893afc4c4ca488453ea9f19ced5fa5861
>
> 12000853 is still smaller than sqrt(16^12) = 16777216, but the safety
> margin is getting smaller. E.g. my main work tree already contains
> almost 18M objects. Hence the Birthday Paradox states that collisions
> of 12 char sha1 values are imminent.
>
> Note that we standardized on 12 chars in commit d311cd44545f2f69
> ("checkpatch: add test for commit id formatting style in commit log") in
> v3.17. For comparison, running git-unique-abbrev on a tree with all
> (upstream + stable) releases from that era gives:
>
> 4052307 objects
> 4: 4052307 / 65536
> 5: 3966948 / 940963
> 6: 869691 / 417363
> 7: 61208 / 30523
> 8: 3979 / 1989
> 9: 258 / 129
> 10: 24 / 12
> 11: 6 / 3
> 12: 0 / 0
> 21cf4d54d3c702ac20c6747fa6d4f64dee07dd11
> 21cf4d54d3ced8a3e752030e483d72997721076d
> d597639e2036f04f0226761e2d818b31f2db7820
> d597639e203a100156501df8a0756fd09573e2de
> ef91b6e893a00d903400f8e1303efc4d52b710af
> ef91b6e893afc4c4ca488453ea9f19ced5fa5861
>
> So the number of objects increased threefold during the last 9 years.
>
> Thanks for your comments!
>
> [1] https://blog.cuviper.com/2013/11/10/how-short-can-git-abbreviate/
> ---
> scripts/checkpatch.pl | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index 25fdb7fda1128aa9..a4e178a68f6d1d5f 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -3209,7 +3209,7 @@ sub process {
> $tag_case = 0 if $tag eq "Fixes:";
> $tag_space = 0 if ($line =~ /^fixes:? [0-9a-f]{5,} ($balanced_parens)/i);
>
> - $id_length = 0 if ($orig_commit =~ /^[0-9a-f]{12}$/i);
> + $id_length = 0 if ($orig_commit =~ /^[0-9a-f]{12,40}$/i);
> $id_case = 0 if ($orig_commit !~ /[A-F]/);
>
> # Always strip leading/trailing parens then double quotes if existing
> @@ -3226,7 +3226,7 @@ sub process {
> if ($ctitle ne $title || $tag_case || $tag_space ||
> $id_length || $id_case || !$title_has_quotes) {
> if (WARN("BAD_FIXES_TAG",
> - "Please use correct Fixes: style 'Fixes: <12 chars of sha1> (\"<title line>\")' - ie: 'Fixes: $cid (\"$ctitle\")'\n" . $herecurr) &&
> + "Please use correct Fixes: style 'Fixes: <12+ chars of sha1> (\"<title line>\")' - ie: 'Fixes: $cid (\"$ctitle\")'\n" . $herecurr) &&
> $fix) {
> $fixed[$fixlinenr] = "Fixes: $cid (\"$ctitle\")";
> }
> --
> 2.34.1
>

--
Kind Regards,
Niklas Söderlund