2016-03-03 03:01:35

by Jianyu Zhan

[permalink] [raw]
Subject: about get_maintainer.pl not showing the original author of the modified code

Hi, Joe,

recently in this patch https://lkml.org/lkml/2016/3/2/281,

I found the original author of the modified code was not
suggested as a patch reviewer.

This surprised me at first. But later I realized that it is not trivial
to define "original author", since the last modification might be
just code polishing. To dig out the original author, we might need
go up the commit history and git must be equipped with ability of pattern
recognization to analyze code.

What do you think?

Regards,
Jianyu Zhan


2016-03-03 03:32:36

by Joe Perches

[permalink] [raw]
Subject: Re: about get_maintainer.pl not showing the original author of the modified code

On Thu, 2016-03-03 at 11:00 +0800, Jianyu Zhan wrote:
> Hi, Joe,
>
> recently in this patch https://lkml.org/lkml/2016/3/2/281,
>
> I found the original author of the modified code was not
> suggested as a patch reviewer.
>
> This surprised me at first.??But later I realized that it is not trivial
> to define "original author", since the last modification might be
> just code polishing.??To dig out the original author, we might need
> go up the commit history and git must be equipped with ability of pattern
> recognization to analyze code.
>
> What do you think?

There are many options to the get_maintainer script.
You can see all of them with "./scripts/get_maintainer.pl --help"

Many are not enabled because run-time can be very long as
running various git commands can take awhile to complete.

$ ./scripts/get_maintainer.pl --git-blame <patch|-f <file>>

with your patch get_maintainer.pl and --git-blame gives:

$ ./scripts/get_maintainer.pl ~/1.diff --git-blame
Thomas Gleixner <[email protected]> (commit_signer:20/19=100%,authored:7/19=37%,added_lines:73/248=29%,removed_lines:38/119=32%)
Peter Zijlstra <[email protected]> (commit_signer:3/19=16%,authored:1/19=5%)
Andrew Morton <[email protected]> (commit_signer:3/19=16%,modified commits:2/3=67%)
Davidlohr Bueso <[email protected]> (commit_signer:3/19=16%,authored:3/19=16%,added_lines:113/248=46%,removed_lines:20/119=17%)
Ingo Molnar <[email protected]> (commit_signer:2/19=11%,modified commits:2/3=67%)
Dominik Dingel <[email protected]> (authored:1/19=5%)
Jann Horn <[email protected]> (authored:1/19=5%)
Sebastian Andrzej Siewior <[email protected]> (added_lines:29/248=12%)
"Kirill A. Shutemov" <[email protected]> (added_lines:14/248=6%,removed_lines:49/119=41%)
Darren Hart <[email protected]> (modified commits:1/3=33%)
Stephen Hemminger <[email protected]> (modified commits:1/3=33%)
Christian Borntraeger <[email protected]> (modified commits:1/3=33%)
[email protected] (open list)

vs

$ ./scripts/get_maintainer.pl ~/1.diff
Thomas Gleixner <[email protected]> (commit_signer:20/19=100%,authored:7/19=37%,added_lines:73/248=29%,removed_lines:38/119=32%)
Davidlohr Bueso <[email protected]> (commit_signer:3/19=16%,authored:3/19=16%,added_lines:113/248=46%,removed_lines:20/119=17%)
Peter Zijlstra <[email protected]> (commit_signer:3/19=16%)
Andrew Morton <[email protected]> (commit_signer:3/19=16%)
Ingo Molnar <[email protected]> (commit_signer:2/19=11%)
Rasmus Villemoes <[email protected]> (authored:1/19=5%)
Darren Hart <[email protected]> (authored:1/19=5%)
kbuild test robot <[email protected]> (authored:1/19=5%)
Sebastian Andrzej Siewior <[email protected]> (added_lines:29/248=12%)
"Kirill A. Shutemov" <[email protected]> (added_lines:14/248=6%,removed_lines:49/119=41%)

running git blame alone gives:

$ git blame -L1927,+8 kernel/futex.c
^1da177e (Linus Torvalds????????2005-04-16 15:20:36 -0700 1927)?
^1da177e (Linus Torvalds????????2005-04-16 15:20:36 -0700 1928)?????????/* In the common case we don't take the spinlock, which is nice. */
42d35d48 (Darren Hart???????????2008-12-29 15:49:53 -0800 1929) retry:
^1da177e (Linus Torvalds????????2005-04-16 15:20:36 -0700 1930)?????????lock_ptr = q->lock_ptr;
e91467ec (Christian Borntraeger 2006-08-05 12:13:52 -0700 1931)?????????barrier();
c80544dc (Stephen Hemminger?????2007-10-18 03:07:05 -0700 1932)?????????if (lock_ptr != NULL) {
^1da177e (Linus Torvalds????????2005-04-16 15:20:36 -0700 1933)?????????????????spin_lock(lock_ptr);
^1da177e (Linus Torvalds????????2005-04-16 15:20:36 -0700 1934)?????????????????/*

with whitespace ignored, it gives:

$ git blame -w -L1927,+8 kernel/futex.c?
^1da177e (Linus Torvalds????????2005-04-16 15:20:36 -0700 1927)?
^1da177e (Linus Torvalds????????2005-04-16 15:20:36 -0700 1928)?????????/* In the common case we don't take the spinlock, which is nice. */
^1da177e (Linus Torvalds????????2005-04-16 15:20:36 -0700 1929) retry:
^1da177e (Linus Torvalds????????2005-04-16 15:20:36 -0700 1930)?????????lock_ptr = q->lock_ptr;
e91467ec (Christian Borntraeger 2006-08-05 12:13:52 -0700 1931)?????????barrier();
c80544dc (Stephen Hemminger?????2007-10-18 03:07:05 -0700 1932)?????????if (lock_ptr != NULL) {
^1da177e (Linus Torvalds????????2005-04-16 15:20:36 -0700 1933)?????????????????spin_lock(lock_ptr);
^1da177e (Linus Torvalds????????2005-04-16 15:20:36 -0700 1934)?????????????????/*

so there's an argument that get_maintainer.pl should by default
ignore whitespace changes.

2016-03-03 04:10:36

by Jianyu Zhan

[permalink] [raw]
Subject: Re: about get_maintainer.pl not showing the original author of the modified code

On Thu, Mar 3, 2016 at 11:32 AM, Joe Perches <[email protected]> wrote:
> with your patch get_maintainer.pl and --git-blame gives:
>
> $ ./scripts/get_maintainer.pl ~/1.diff --git-blame
> Thomas Gleixner <[email protected]> (commit_signer:20/19=100%,authored:7/19=37%,added_lines:73/248=29%,removed_lines:38/119=32%)
> Peter Zijlstra <[email protected]> (commit_signer:3/19=16%,authored:1/19=5%)
> Andrew Morton <[email protected]> (commit_signer:3/19=16%,modified commits:2/3=67%)
> Davidlohr Bueso <[email protected]> (commit_signer:3/19=16%,authored:3/19=16%,added_lines:113/248=46%,removed_lines:20/119=17%)
> Ingo Molnar <[email protected]> (commit_signer:2/19=11%,modified commits:2/3=67%)
> Dominik Dingel <[email protected]> (authored:1/19=5%)
> Jann Horn <[email protected]> (authored:1/19=5%)
> Sebastian Andrzej Siewior <[email protected]> (added_lines:29/248=12%)
> "Kirill A. Shutemov" <[email protected]> (added_lines:14/248=6%,removed_lines:49/119=41%)
> Darren Hart <[email protected]> (modified commits:1/3=33%)
> Stephen Hemminger <[email protected]> (modified commits:1/3=33%)
> Christian Borntraeger <[email protected]> (modified commits:1/3=33%)
> [email protected] (open list)
>
> vs
>
> $ ./scripts/get_maintainer.pl ~/1.diff
> Thomas Gleixner <[email protected]> (commit_signer:20/19=100%,authored:7/19=37%,added_lines:73/248=29%,removed_lines:38/119=32%)
> Davidlohr Bueso <[email protected]> (commit_signer:3/19=16%,authored:3/19=16%,added_lines:113/248=46%,removed_lines:20/119=17%)
> Peter Zijlstra <[email protected]> (commit_signer:3/19=16%)
> Andrew Morton <[email protected]> (commit_signer:3/19=16%)
> Ingo Molnar <[email protected]> (commit_signer:2/19=11%)
> Rasmus Villemoes <[email protected]> (authored:1/19=5%)
> Darren Hart <[email protected]> (authored:1/19=5%)
> kbuild test robot <[email protected]> (authored:1/19=5%)
> Sebastian Andrzej Siewior <[email protected]> (added_lines:29/248=12%)
> "Kirill A. Shutemov" <[email protected]> (added_lines:14/248=6%,removed_lines:49/119=41%)


So original author's email would be spit out only if given
--git-blame, instead of by default,
which really surprises people. Since most time we will cc original
author, we have to use
--git-blame explicitly, thus pay the running time cost. So the runtime
argument don't stands,
why not just enable it by default?


Regards,
Jianyu Zhan

2016-03-03 04:24:44

by Joe Perches

[permalink] [raw]
Subject: Re: about get_maintainer.pl not showing the original author of the modified code

On Thu, 2016-03-03 at 12:09 +0800, Jianyu Zhan wrote:
> On Thu, Mar 3, 2016 at 11:32 AM, Joe Perches <[email protected]> wrote:
> >
> > with your patch get_maintainer.pl and --git-blame gives:
> >
> > $ ./scripts/get_maintainer.pl ~/1.diff --git-blame
> > Thomas Gleixner <[email protected]> (commit_signer:20/19=100%,authored:7/19=37%,added_lines:73/248=29%,removed_lines:38/119=32%)
> > Peter Zijlstra <[email protected]> (commit_signer:3/19=16%,authored:1/19=5%)
> > Andrew Morton <[email protected]> (commit_signer:3/19=16%,modified commits:2/3=67%)
> > Davidlohr Bueso <[email protected]> (commit_signer:3/19=16%,authored:3/19=16%,added_lines:113/248=46%,removed_lines:20/119=17%)
> > Ingo Molnar <[email protected]> (commit_signer:2/19=11%,modified commits:2/3=67%)
> > Dominik Dingel <[email protected]> (authored:1/19=5%)
> > Jann Horn <[email protected]> (authored:1/19=5%)
> > Sebastian Andrzej Siewior <[email protected]> (added_lines:29/248=12%)
> > "Kirill A. Shutemov" <[email protected]> (added_lines:14/248=6%,removed_lines:49/119=41%)
> > Darren Hart <[email protected]> (modified commits:1/3=33%)
> > Stephen Hemminger <[email protected]> (modified commits:1/3=33%)
> > Christian Borntraeger <[email protected]> (modified commits:1/3=33%)
> > [email protected] (open list)
> >
> > vs
> >
> > $ ./scripts/get_maintainer.pl ~/1.diff
> > Thomas Gleixner <[email protected]> (commit_signer:20/19=100%,authored:7/19=37%,added_lines:73/248=29%,removed_lines:38/119=32%)
> > Davidlohr Bueso <[email protected]> (commit_signer:3/19=16%,authored:3/19=16%,added_lines:113/248=46%,removed_lines:20/119=17%)
> > Peter Zijlstra <[email protected]> (commit_signer:3/19=16%)
> > Andrew Morton <[email protected]> (commit_signer:3/19=16%)
> > Ingo Molnar <[email protected]> (commit_signer:2/19=11%)
> > Rasmus Villemoes <[email protected]> (authored:1/19=5%)
> > Darren Hart <[email protected]> (authored:1/19=5%)
> > kbuild test robot <[email protected]> (authored:1/19=5%)
> > Sebastian Andrzej Siewior <[email protected]> (added_lines:29/248=12%)
> > "Kirill A. Shutemov" <[email protected]> (added_lines:14/248=6%,removed_lines:49/119=41%)
>
> So original author's email would be spit out only if given
> --git-blame, instead of by default,
> which really surprises people.??Since most time we will cc original
> author,??we have to use
> --git-blame explicitly, thus pay the running time cost. So the runtime
> ?argument don't stands,
> why not just enable it by default?

Several reasons:

Most of the time, git history isn't used at all
because there is a specific named maintainer.

history isn't always appropriate as the original
authors email address is old and bounces

a .get_maintainer.conf file can be used to set
whatever defaults anyone prefers.