2023-10-04 21:21:56

by Justin Stitt

[permalink] [raw]
Subject: [PATCH] get_maintainer/MAINTAINERS: confine K content matching to patches

The current behavior of K: is a tad bit noisy. It matches against the
entire contents of files instead of just against the contents of a
patch.

This means that a patch with a single character change (fixing a typo or
whitespace or something) would still to/cc maintainers and lists if the
affected file matched against the regex pattern given in K:. For
example, if a file has the word "clang" in it then every single patch
touching that file will to/cc Nick, Nathan and some lists.

Let's change this behavior to only content match against patches
(subjects, message, diff) as this is what most people expect the
behavior already is. Most users of "K:" would prefer patch-only content
matching. If this is not the case let's add a new matching type as
proposed in [1].

This patch involves 1) ripping out the file-based keyword matching from
get_maintainer.pl and 2) updating the MAINTAINERS documentation to
reflect patch-only semantics.

Signed-off-by: Justin Stitt <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Link: https://lore.kernel.org/all/[email protected]/ [1]
---
MAINTAINERS | 8 ++++----
scripts/get_maintainer.pl | 13 -------------
2 files changed, 4 insertions(+), 17 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 35977b269d5e..13e7f40ea70b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -51,13 +51,13 @@ Descriptions of section entries and preferred order
get_maintainer will not look at git log history when an F: pattern
match occurs. When an N: match occurs, git log history is used
to also notify the people that have git commit signatures.
- K: *Content regex* (perl extended) pattern match in a patch or file.
+ K: *Content regex* (perl extended) pattern match patch content
For instance:
K: of_get_profile
- matches patches or files that contain "of_get_profile"
+ matches patches that contain "of_get_profile"
K: \b(printk|pr_(info|err))\b
- matches patches or files that contain one or more of the words
- printk, pr_info or pr_err
+ matches patches that contain one or more of the words printk,
+ pr_info or pr_err
One regex pattern per line. Multiple K: lines acceptable.

Maintainers List
diff --git a/scripts/get_maintainer.pl b/scripts/get_maintainer.pl
index ab123b498fd9..ea58929287bf 100755
--- a/scripts/get_maintainer.pl
+++ b/scripts/get_maintainer.pl
@@ -548,19 +548,6 @@ foreach my $file (@ARGV) {
$file =~ s/^\Q${cur_path}\E//; #strip any absolute path
$file =~ s/^\Q${lk_path}\E//; #or the path to the lk tree
push(@files, $file);
- if ($file ne "MAINTAINERS" && -f $file && $keywords) {
- open(my $f, '<', $file)
- or die "$P: Can't open $file: $!\n";
- my $text = do { local($/) ; <$f> };
- close($f);
- if ($keywords) {
- foreach my $line (keys %keyword_hash) {
- if ($text =~ m/$keyword_hash{$line}/x) {
- push(@keyword_tvi, $line);
- }
- }
- }
- }
} else {
my $file_cnt = @files;
my $lastfile;

---
base-commit: cbf3a2cb156a2c911d8f38d8247814b4c07f49a2
change-id: 20231004-get_maintainer_change_k-46a2055e2c59

Best regards,
--
Justin Stitt <[email protected]>


2023-10-05 02:41:24

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH] get_maintainer/MAINTAINERS: confine K content matching to patches

On Wed, 2023-10-04 at 21:21 +0000, Justin Stitt wrote:
> The current behavior of K: is a tad bit noisy. It matches against the
> entire contents of files instead of just against the contents of a
> patch.
>
> This means that a patch with a single character change (fixing a typo or
> whitespace or something) would still to/cc maintainers and lists if the
> affected file matched against the regex pattern given in K:. For
> example, if a file has the word "clang" in it then every single patch
> touching that file will to/cc Nick, Nathan and some lists.
>
> Let's change this behavior to only content match against patches
> (subjects, message, diff) as this is what most people expect the
> behavior already is. Most users of "K:" would prefer patch-only content
> matching. If this is not the case let's add a new matching type as
> proposed in [1].

I'm glad to know you are coming around to my suggestion.

I believe the file-based keyword matching should _not_ be
removed and the option should be added for it like I suggested.

I also think it might be better to mark the "maintained" output
differently as something like "keyword matched" instead.


Something like:
---
scripts/get_maintainer.pl | 38 ++++++++++++++++++++------------------
1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/scripts/get_maintainer.pl b/scripts/get_maintainer.pl
index ab123b498fd9..befae75e61ab 100755
--- a/scripts/get_maintainer.pl
+++ b/scripts/get_maintainer.pl
@@ -57,6 +57,7 @@ my $subsystem = 0;
my $status = 0;
my $letters = "";
my $keywords = 1;
+my $keywords_in_file = 0;
my $sections = 0;
my $email_file_emails = 0;
my $from_filename = 0;
@@ -272,6 +273,7 @@ if (!GetOptions(
'letters=s' => \$letters,
'pattern-depth=i' => \$pattern_depth,
'k|keywords!' => \$keywords,
+ 'kf|keywords-in-file!' => \$keywords_in_file,
'sections!' => \$sections,
'fe|file-emails!' => \$email_file_emails,
'f|file' => \$from_filename,
@@ -318,6 +320,7 @@ if ($sections || $letters ne "") {
$subsystem = 0;
$web = 0;
$keywords = 0;
+ $keywords_in_file = 0;
$interactive = 0;
} else {
my $selections = $email + $scm + $status + $subsystem + $web;
@@ -548,16 +551,14 @@ foreach my $file (@ARGV) {
$file =~ s/^\Q${cur_path}\E//; #strip any absolute path
$file =~ s/^\Q${lk_path}\E//; #or the path to the lk tree
push(@files, $file);
- if ($file ne "MAINTAINERS" && -f $file && $keywords) {
+ if ($file ne "MAINTAINERS" && -f $file && $keywords && $keywords_in_file) {
open(my $f, '<', $file)
or die "$P: Can't open $file: $!\n";
my $text = do { local($/) ; <$f> };
close($f);
- if ($keywords) {
- foreach my $line (keys %keyword_hash) {
- if ($text =~ m/$keyword_hash{$line}/x) {
- push(@keyword_tvi, $line);
- }
+ foreach my $line (keys %keyword_hash) {
+ if ($text =~ m/$keyword_hash{$line}/x) {
+ push(@keyword_tvi, $line);
}
}
}
@@ -919,7 +920,7 @@ sub get_maintainers {
}

foreach my $line (sort {$hash{$b} <=> $hash{$a}} keys %hash) {
- add_categories($line);
+ add_categories($line, "");
if ($sections) {
my $i;
my $start = find_starting_index($line);
@@ -947,7 +948,7 @@ sub get_maintainers {
if ($keywords) {
@keyword_tvi = sort_and_uniq(@keyword_tvi);
foreach my $line (@keyword_tvi) {
- add_categories($line);
+ add_categories($line, ":Keyword");
}
}

@@ -1076,6 +1077,7 @@ Output type options:
Other options:
--pattern-depth => Number of pattern directory traversals (default: 0 (all))
--keywords => scan patch for keywords (default: $keywords)
+ --keywords-in-file => scan file for keywords (default: $keywords_in_file)
--sections => print all of the subsystem sections with pattern matches
--letters => print all matching 'letter' types from all matching sections
--mailmap => use .mailmap file (default: $email_use_mailmap)
@@ -1086,7 +1088,7 @@ Other options:

Default options:
[--email --tree --nogit --git-fallback --m --r --n --l --multiline
- --pattern-depth=0 --remove-duplicates --rolestats]
+ --pattern-depth=0 --remove-duplicates --rolestats --keywords]

Notes:
Using "-f directory" may give unexpected results:
@@ -1312,7 +1314,7 @@ sub get_list_role {
}

sub add_categories {
- my ($index) = @_;
+ my ($index, $suffix) = @_;

my $i;
my $start = find_starting_index($index);
@@ -1342,7 +1344,7 @@ sub add_categories {
if (!$hash_list_to{lc($list_address)}) {
$hash_list_to{lc($list_address)} = 1;
push(@list_to, [$list_address,
- "subscriber list${list_role}"]);
+ "subscriber list${list_role}" . $suffix]);
}
}
} else {
@@ -1352,12 +1354,12 @@ sub add_categories {
if ($email_moderated_list) {
$hash_list_to{lc($list_address)} = 1;
push(@list_to, [$list_address,
- "moderated list${list_role}"]);
+ "moderated list${list_role}" . $suffix]);
}
} else {
$hash_list_to{lc($list_address)} = 1;
push(@list_to, [$list_address,
- "open list${list_role}"]);
+ "open list${list_role}" . $suffix]);
}
}
}
@@ -1365,19 +1367,19 @@ sub add_categories {
} elsif ($ptype eq "M") {
if ($email_maintainer) {
my $role = get_maintainer_role($i);
- push_email_addresses($pvalue, $role);
+ push_email_addresses($pvalue, $role . $suffix);
}
} elsif ($ptype eq "R") {
if ($email_reviewer) {
my $subsystem = get_subsystem_name($i);
- push_email_addresses($pvalue, "reviewer:$subsystem");
+ push_email_addresses($pvalue, "reviewer:$subsystem" . $suffix);
}
} elsif ($ptype eq "T") {
- push(@scm, $pvalue);
+ push(@scm, $pvalue . $suffix);
} elsif ($ptype eq "W") {
- push(@web, $pvalue);
+ push(@web, $pvalue . $suffix);
} elsif ($ptype eq "S") {
- push(@status, $pvalue);
+ push(@status, $pvalue . $suffix);
}
}
}

2023-10-05 18:07:42

by Justin Stitt

[permalink] [raw]
Subject: Re: [PATCH] get_maintainer/MAINTAINERS: confine K content matching to patches

On Wed, Oct 4, 2023 at 7:40 PM Joe Perches <[email protected]> wrote:
>
> On Wed, 2023-10-04 at 21:21 +0000, Justin Stitt wrote:
> > The current behavior of K: is a tad bit noisy. It matches against the
> > entire contents of files instead of just against the contents of a
> > patch.
> >
> > This means that a patch with a single character change (fixing a typo or
> > whitespace or something) would still to/cc maintainers and lists if the
> > affected file matched against the regex pattern given in K:. For
> > example, if a file has the word "clang" in it then every single patch
> > touching that file will to/cc Nick, Nathan and some lists.
> >
> > Let's change this behavior to only content match against patches
> > (subjects, message, diff) as this is what most people expect the
> > behavior already is. Most users of "K:" would prefer patch-only content
> > matching. If this is not the case let's add a new matching type as
> > proposed in [1].
>
> I'm glad to know you are coming around to my suggestion.
:)

>
> I believe the file-based keyword matching should _not_ be
> removed and the option should be added for it like I suggested.

Having a command line flag allowing get_maintainer.pl
users to decide the behavior of K: is weird to me. If I'm a maintainer setting
my K: in MAINTAINERS I want some sort of consistent behavior. Some
patches will start hitting mailing list that DO have keywords in the patch
and others, confusingly, not. I understand setting the default state of
that flag is probably good enough as it is what 99.9% of users will use.

What do you think about this issue in particular, though?

To note, we get some speed-up here as pattern matching a patch that
touches lots of files would result in searching all of them in their
entirety. Just removing this behavior _might_ have a measurable
speed-up for patch series touching dozens of files.

>
> I also think it might be better to mark the "maintained" output
> differently as something like "keyword matched" instead.
>
>
> Something like:
> ---
> scripts/get_maintainer.pl | 38 ++++++++++++++++++++------------------
> 1 file changed, 20 insertions(+), 18 deletions(-)
>
> diff --git a/scripts/get_maintainer.pl b/scripts/get_maintainer.pl
> index ab123b498fd9..befae75e61ab 100755
> --- a/scripts/get_maintainer.pl
> +++ b/scripts/get_maintainer.pl
> @@ -57,6 +57,7 @@ my $subsystem = 0;
> my $status = 0;
> my $letters = "";
> my $keywords = 1;
> +my $keywords_in_file = 0;
> my $sections = 0;
> my $email_file_emails = 0;
> my $from_filename = 0;
> @@ -272,6 +273,7 @@ if (!GetOptions(
> 'letters=s' => \$letters,
> 'pattern-depth=i' => \$pattern_depth,
> 'k|keywords!' => \$keywords,
> + 'kf|keywords-in-file!' => \$keywords_in_file,
> 'sections!' => \$sections,
> 'fe|file-emails!' => \$email_file_emails,
> 'f|file' => \$from_filename,
> @@ -318,6 +320,7 @@ if ($sections || $letters ne "") {
> $subsystem = 0;
> $web = 0;
> $keywords = 0;
> + $keywords_in_file = 0;
> $interactive = 0;
> } else {
> my $selections = $email + $scm + $status + $subsystem + $web;
> @@ -548,16 +551,14 @@ foreach my $file (@ARGV) {
> $file =~ s/^\Q${cur_path}\E//; #strip any absolute path
> $file =~ s/^\Q${lk_path}\E//; #or the path to the lk tree
> push(@files, $file);
> - if ($file ne "MAINTAINERS" && -f $file && $keywords) {
> + if ($file ne "MAINTAINERS" && -f $file && $keywords && $keywords_in_file) {
> open(my $f, '<', $file)
> or die "$P: Can't open $file: $!\n";
> my $text = do { local($/) ; <$f> };
> close($f);
> - if ($keywords) {
> - foreach my $line (keys %keyword_hash) {
> - if ($text =~ m/$keyword_hash{$line}/x) {
> - push(@keyword_tvi, $line);
> - }
> + foreach my $line (keys %keyword_hash) {
> + if ($text =~ m/$keyword_hash{$line}/x) {
> + push(@keyword_tvi, $line);
> }
> }
> }
> @@ -919,7 +920,7 @@ sub get_maintainers {
> }
>
> foreach my $line (sort {$hash{$b} <=> $hash{$a}} keys %hash) {
> - add_categories($line);
> + add_categories($line, "");
> if ($sections) {
> my $i;
> my $start = find_starting_index($line);
> @@ -947,7 +948,7 @@ sub get_maintainers {
> if ($keywords) {
> @keyword_tvi = sort_and_uniq(@keyword_tvi);
> foreach my $line (@keyword_tvi) {
> - add_categories($line);
> + add_categories($line, ":Keyword");
> }
> }
>
> @@ -1076,6 +1077,7 @@ Output type options:
> Other options:
> --pattern-depth => Number of pattern directory traversals (default: 0 (all))
> --keywords => scan patch for keywords (default: $keywords)
> + --keywords-in-file => scan file for keywords (default: $keywords_in_file)
> --sections => print all of the subsystem sections with pattern matches
> --letters => print all matching 'letter' types from all matching sections
> --mailmap => use .mailmap file (default: $email_use_mailmap)
> @@ -1086,7 +1088,7 @@ Other options:
>
> Default options:
> [--email --tree --nogit --git-fallback --m --r --n --l --multiline
> - --pattern-depth=0 --remove-duplicates --rolestats]
> + --pattern-depth=0 --remove-duplicates --rolestats --keywords]
>
> Notes:
> Using "-f directory" may give unexpected results:
> @@ -1312,7 +1314,7 @@ sub get_list_role {
> }
>
> sub add_categories {
> - my ($index) = @_;
> + my ($index, $suffix) = @_;
>
> my $i;
> my $start = find_starting_index($index);
> @@ -1342,7 +1344,7 @@ sub add_categories {
> if (!$hash_list_to{lc($list_address)}) {
> $hash_list_to{lc($list_address)} = 1;
> push(@list_to, [$list_address,
> - "subscriber list${list_role}"]);
> + "subscriber list${list_role}" . $suffix]);
> }
> }
> } else {
> @@ -1352,12 +1354,12 @@ sub add_categories {
> if ($email_moderated_list) {
> $hash_list_to{lc($list_address)} = 1;
> push(@list_to, [$list_address,
> - "moderated list${list_role}"]);
> + "moderated list${list_role}" . $suffix]);
> }
> } else {
> $hash_list_to{lc($list_address)} = 1;
> push(@list_to, [$list_address,
> - "open list${list_role}"]);
> + "open list${list_role}" . $suffix]);
> }
> }
> }
> @@ -1365,19 +1367,19 @@ sub add_categories {
> } elsif ($ptype eq "M") {
> if ($email_maintainer) {
> my $role = get_maintainer_role($i);
> - push_email_addresses($pvalue, $role);
> + push_email_addresses($pvalue, $role . $suffix);
> }
> } elsif ($ptype eq "R") {
> if ($email_reviewer) {
> my $subsystem = get_subsystem_name($i);
> - push_email_addresses($pvalue, "reviewer:$subsystem");
> + push_email_addresses($pvalue, "reviewer:$subsystem" . $suffix);
> }
> } elsif ($ptype eq "T") {
> - push(@scm, $pvalue);
> + push(@scm, $pvalue . $suffix);
> } elsif ($ptype eq "W") {
> - push(@web, $pvalue);
> + push(@web, $pvalue . $suffix);
> } elsif ($ptype eq "S") {
> - push(@status, $pvalue);
> + push(@status, $pvalue . $suffix);
> }
> }
> }
>

2023-10-05 18:16:49

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH] get_maintainer/MAINTAINERS: confine K content matching to patches

On Thu, 2023-10-05 at 11:06 -0700, Justin Stitt wrote:
> On Wed, Oct 4, 2023 at 7:40 PM Joe Perches <[email protected]> wrote:
> >
> > On Wed, 2023-10-04 at 21:21 +0000, Justin Stitt wrote:
> > > The current behavior of K: is a tad bit noisy. It matches against the
> > > entire contents of files instead of just against the contents of a
> > > patch.
> > >
> > > This means that a patch with a single character change (fixing a typo or
> > > whitespace or something) would still to/cc maintainers and lists if the
> > > affected file matched against the regex pattern given in K:. For
> > > example, if a file has the word "clang" in it then every single patch
> > > touching that file will to/cc Nick, Nathan and some lists.
> > >
> > > Let's change this behavior to only content match against patches
> > > (subjects, message, diff) as this is what most people expect the
> > > behavior already is. Most users of "K:" would prefer patch-only content
> > > matching. If this is not the case let's add a new matching type as
> > > proposed in [1].
> >
> > I'm glad to know you are coming around to my suggestion.
> :)
>
> >
> > I believe the file-based keyword matching should _not_ be
> > removed and the option should be added for it like I suggested.
>
> Having a command line flag allowing get_maintainer.pl
> users to decide the behavior of K: is weird to me. If I'm a maintainer setting
> my K: in MAINTAINERS I want some sort of consistent behavior. Some
> patches will start hitting mailing list that DO have keywords in the patch
> and others, confusingly, not.

Not true.

If a patch contains a keyword match, get_maintainers will _always_
show the K: keyword maintainers unless --nokeywords is specified
on the command line.

If a file contains a keyword match, it'll only show the K:
keyword if --keywords-in-file is set.

> To note, we get some speed-up here as pattern matching a patch that
> touches lots of files would result in searching all of them in their
> entirety. Just removing this behavior _might_ have a measurable
> speed-up for patch series touching dozens of files.

Again, not true.

Patches do _not_ scan the original modified files for keyword matches.
Only the patch itself is scanned. That's the current behavior as well.

2023-10-05 18:31:15

by Justin Stitt

[permalink] [raw]
Subject: Re: [PATCH] get_maintainer/MAINTAINERS: confine K content matching to patches

On Thu, Oct 5, 2023 at 11:15 AM Joe Perches <[email protected]> wrote:
>
> On Thu, 2023-10-05 at 11:06 -0700, Justin Stitt wrote:
> > On Wed, Oct 4, 2023 at 7:40 PM Joe Perches <[email protected]> wrote:
> > >
> > > On Wed, 2023-10-04 at 21:21 +0000, Justin Stitt wrote:
> > > > The current behavior of K: is a tad bit noisy. It matches against the
> > > > entire contents of files instead of just against the contents of a
> > > > patch.
> > > >
> > > > This means that a patch with a single character change (fixing a typo or
> > > > whitespace or something) would still to/cc maintainers and lists if the
> > > > affected file matched against the regex pattern given in K:. For
> > > > example, if a file has the word "clang" in it then every single patch
> > > > touching that file will to/cc Nick, Nathan and some lists.
> > > >
> > > > Let's change this behavior to only content match against patches
> > > > (subjects, message, diff) as this is what most people expect the
> > > > behavior already is. Most users of "K:" would prefer patch-only content
> > > > matching. If this is not the case let's add a new matching type as
> > > > proposed in [1].
> > >
> > > I'm glad to know you are coming around to my suggestion.
> > :)
> >
> > >
> > > I believe the file-based keyword matching should _not_ be
> > > removed and the option should be added for it like I suggested.
> >
> > Having a command line flag allowing get_maintainer.pl
> > users to decide the behavior of K: is weird to me. If I'm a maintainer setting
> > my K: in MAINTAINERS I want some sort of consistent behavior. Some
> > patches will start hitting mailing list that DO have keywords in the patch
> > and others, confusingly, not.
>
> Not true.
>
> If a patch contains a keyword match, get_maintainers will _always_
> show the K: keyword maintainers unless --nokeywords is specified
> on the command line.

...

>
> If a file contains a keyword match, it'll only show the K:
> keyword if --keywords-in-file is set.

Right, what I'm saying is a patch can arrive in a maintainer's inbox
wherein the patch itself has no mention of the keyword (if
get_maintainer user opted for --keywords-in-file). Just trying to
avoid some cases of the question: "Why is this in my inbox?"

>
> > To note, we get some speed-up here as pattern matching a patch that
> > touches lots of files would result in searching all of them in their
> > entirety. Just removing this behavior _might_ have a measurable
> > speed-up for patch series touching dozens of files.
>
> Again, not true.
>
> Patches do _not_ scan the original modified files for keyword matches.
> Only the patch itself is scanned. That's the current behavior as well.
>

Feel like I'm missing something here. How is K: matching keywords in
files without reading them.

If my patch touches 10 files then all 10 of those files are scanned for
K: matches right?

2023-10-05 18:42:51

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH] get_maintainer/MAINTAINERS: confine K content matching to patches

On Thu, 2023-10-05 at 11:30 -0700, Justin Stitt wrote:
> On Thu, Oct 5, 2023 at 11:15 AM Joe Perches <[email protected]> wrote:
> >
> > On Thu, 2023-10-05 at 11:06 -0700, Justin Stitt wrote:
> > > On Wed, Oct 4, 2023 at 7:40 PM Joe Perches <[email protected]> wrote:
> > > >
> > > > On Wed, 2023-10-04 at 21:21 +0000, Justin Stitt wrote:
> > > > > The current behavior of K: is a tad bit noisy. It matches against the
> > > > > entire contents of files instead of just against the contents of a
> > > > > patch.
> > > > >
> > > > > This means that a patch with a single character change (fixing a typo or
> > > > > whitespace or something) would still to/cc maintainers and lists if the
> > > > > affected file matched against the regex pattern given in K:. For
> > > > > example, if a file has the word "clang" in it then every single patch
> > > > > touching that file will to/cc Nick, Nathan and some lists.
> > > > >
> > > > > Let's change this behavior to only content match against patches
> > > > > (subjects, message, diff) as this is what most people expect the
> > > > > behavior already is. Most users of "K:" would prefer patch-only content
> > > > > matching. If this is not the case let's add a new matching type as
> > > > > proposed in [1].
> > > >
> > > > I'm glad to know you are coming around to my suggestion.
> > > :)
> > >
> > > >
> > > > I believe the file-based keyword matching should _not_ be
> > > > removed and the option should be added for it like I suggested.
> > >
> > > Having a command line flag allowing get_maintainer.pl
> > > users to decide the behavior of K: is weird to me. If I'm a maintainer setting
> > > my K: in MAINTAINERS I want some sort of consistent behavior. Some
> > > patches will start hitting mailing list that DO have keywords in the patch
> > > and others, confusingly, not.
> >
> > Not true.
> >
> > If a patch contains a keyword match, get_maintainers will _always_
> > show the K: keyword maintainers unless --nokeywords is specified
> > on the command line.
>
> ...
>
> >
> > If a file contains a keyword match, it'll only show the K:
> > keyword if --keywords-in-file is set.
>
> Right, what I'm saying is a patch can arrive in a maintainer's inbox
> wherein the patch itself has no mention of the keyword (if
> get_maintainer user opted for --keywords-in-file). Just trying to
> avoid some cases of the question: "Why is this in my inbox?"

Because the script user specifically asked for it.

> > > To note, we get some speed-up here as pattern matching a patch that
> > > touches lots of files would result in searching all of them in their
> > > entirety. Just removing this behavior _might_ have a measurable
> > > speed-up for patch series touching dozens of files.
> >
> > Again, not true.
> >
> > Patches do _not_ scan the original modified files for keyword matches.
> > Only the patch itself is scanned. That's the current behavior as well.
> >
>
> Feel like I'm missing something here. How is K: matching keywords in
> files without reading them.
>
> If my patch touches 10 files then all 10 of those files are scanned for
> K: matches right?

Nope.

Understand the patches are the input to get_maintainer and not
just files.

If a patch is fed to get_maintainer then any files modified by
the patch are _not_ scanned.

Only the patch _content_ is used for keyword matches.

2023-10-05 19:53:04

by Justin Stitt

[permalink] [raw]
Subject: Re: [PATCH] get_maintainer/MAINTAINERS: confine K content matching to patches

On Thu, Oct 5, 2023 at 11:42 AM Joe Perches <[email protected]> wrote:
>
> On Thu, 2023-10-05 at 11:30 -0700, Justin Stitt wrote:
> > On Thu, Oct 5, 2023 at 11:15 AM Joe Perches <[email protected]> wrote:
> > >
> > > On Thu, 2023-10-05 at 11:06 -0700, Justin Stitt wrote:
> > > > On Wed, Oct 4, 2023 at 7:40 PM Joe Perches <[email protected]> wrote:
> > > > >
> > > > > On Wed, 2023-10-04 at 21:21 +0000, Justin Stitt wrote:
> > > > > > The current behavior of K: is a tad bit noisy. It matches against the
> > > > > > entire contents of files instead of just against the contents of a
> > > > > > patch.
> > > > > >
> > > > > > This means that a patch with a single character change (fixing a typo or
> > > > > > whitespace or something) would still to/cc maintainers and lists if the
> > > > > > affected file matched against the regex pattern given in K:. For
> > > > > > example, if a file has the word "clang" in it then every single patch
> > > > > > touching that file will to/cc Nick, Nathan and some lists.
> > > > > >
> > > > > > Let's change this behavior to only content match against patches
> > > > > > (subjects, message, diff) as this is what most people expect the
> > > > > > behavior already is. Most users of "K:" would prefer patch-only content
> > > > > > matching. If this is not the case let's add a new matching type as
> > > > > > proposed in [1].
> > > > >
> > > > > I'm glad to know you are coming around to my suggestion.
> > > > :)
> > > >
> > > > >
> > > > > I believe the file-based keyword matching should _not_ be
> > > > > removed and the option should be added for it like I suggested.
> > > >
> > > > Having a command line flag allowing get_maintainer.pl
> > > > users to decide the behavior of K: is weird to me. If I'm a maintainer setting
> > > > my K: in MAINTAINERS I want some sort of consistent behavior. Some
> > > > patches will start hitting mailing list that DO have keywords in the patch
> > > > and others, confusingly, not.
> > >
> > > Not true.
> > >
> > > If a patch contains a keyword match, get_maintainers will _always_
> > > show the K: keyword maintainers unless --nokeywords is specified
> > > on the command line.
> >
> > ...
> >
> > >
> > > If a file contains a keyword match, it'll only show the K:
> > > keyword if --keywords-in-file is set.
> >
> > Right, what I'm saying is a patch can arrive in a maintainer's inbox
> > wherein the patch itself has no mention of the keyword (if
> > get_maintainer user opted for --keywords-in-file). Just trying to
> > avoid some cases of the question: "Why is this in my inbox?"
>
> Because the script user specifically asked for it.
>
> > > > To note, we get some speed-up here as pattern matching a patch that
> > > > touches lots of files would result in searching all of them in their
> > > > entirety. Just removing this behavior _might_ have a measurable
> > > > speed-up for patch series touching dozens of files.
> > >
> > > Again, not true.
> > >
> > > Patches do _not_ scan the original modified files for keyword matches.
> > > Only the patch itself is scanned. That's the current behavior as well.
> > >
> >
> > Feel like I'm missing something here. How is K: matching keywords in
> > files without reading them.
> >
> > If my patch touches 10 files then all 10 of those files are scanned for
> > K: matches right?
>
> Nope.
>
> Understand the patches are the input to get_maintainer and not
> just files.
>
> If a patch is fed to get_maintainer then any files modified by
> the patch are _not_ scanned.
>
> Only the patch _content_ is used for keyword matches.
>

Got it. I'll roll your patch into a v3.

Thanks
Justin

2023-10-05 20:06:25

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH] get_maintainer/MAINTAINERS: confine K content matching to patches

On Thu, 2023-10-05 at 12:52 -0700, Justin Stitt wrote:
> On Thu, Oct 5, 2023 at 11:42 AM Joe Perches <[email protected]> wrote:
> >
> > On Thu, 2023-10-05 at 11:30 -0700, Justin Stitt wrote:
> > > On Thu, Oct 5, 2023 at 11:15 AM Joe Perches <[email protected]> wrote:
> > > >
> > > > On Thu, 2023-10-05 at 11:06 -0700, Justin Stitt wrote:
> > > > > On Wed, Oct 4, 2023 at 7:40 PM Joe Perches <[email protected]> wrote:
> > > > > >
> > > > > > On Wed, 2023-10-04 at 21:21 +0000, Justin Stitt wrote:
> > > > > > > The current behavior of K: is a tad bit noisy. It matches against the
> > > > > > > entire contents of files instead of just against the contents of a
> > > > > > > patch.
> > > > > > >
> > > > > > > This means that a patch with a single character change (fixing a typo or
> > > > > > > whitespace or something) would still to/cc maintainers and lists if the
> > > > > > > affected file matched against the regex pattern given in K:. For
> > > > > > > example, if a file has the word "clang" in it then every single patch
> > > > > > > touching that file will to/cc Nick, Nathan and some lists.
> > > > > > >
> > > > > > > Let's change this behavior to only content match against patches
> > > > > > > (subjects, message, diff) as this is what most people expect the
> > > > > > > behavior already is. Most users of "K:" would prefer patch-only content
> > > > > > > matching. If this is not the case let's add a new matching type as
> > > > > > > proposed in [1].
> > > > > >
> > > > > > I'm glad to know you are coming around to my suggestion.
> > > > > :)
> > > > >
> > > > > >
> > > > > > I believe the file-based keyword matching should _not_ be
> > > > > > removed and the option should be added for it like I suggested.
> > > > >
> > > > > Having a command line flag allowing get_maintainer.pl
> > > > > users to decide the behavior of K: is weird to me. If I'm a maintainer setting
> > > > > my K: in MAINTAINERS I want some sort of consistent behavior. Some
> > > > > patches will start hitting mailing list that DO have keywords in the patch
> > > > > and others, confusingly, not.
> > > >
> > > > Not true.
> > > >
> > > > If a patch contains a keyword match, get_maintainers will _always_
> > > > show the K: keyword maintainers unless --nokeywords is specified
> > > > on the command line.
> > >
> > > ...
> > >
> > > >
> > > > If a file contains a keyword match, it'll only show the K:
> > > > keyword if --keywords-in-file is set.
> > >
> > > Right, what I'm saying is a patch can arrive in a maintainer's inbox
> > > wherein the patch itself has no mention of the keyword (if
> > > get_maintainer user opted for --keywords-in-file). Just trying to
> > > avoid some cases of the question: "Why is this in my inbox?"
> >
> > Because the script user specifically asked for it.
> >
> > > > > To note, we get some speed-up here as pattern matching a patch that
> > > > > touches lots of files would result in searching all of them in their
> > > > > entirety. Just removing this behavior _might_ have a measurable
> > > > > speed-up for patch series touching dozens of files.
> > > >
> > > > Again, not true.
> > > >
> > > > Patches do _not_ scan the original modified files for keyword matches.
> > > > Only the patch itself is scanned. That's the current behavior as well.
> > > >
> > >
> > > Feel like I'm missing something here. How is K: matching keywords in
> > > files without reading them.
> > >
> > > If my patch touches 10 files then all 10 of those files are scanned for
> > > K: matches right?
> >
> > Nope.
> >
> > Understand the patches are the input to get_maintainer and not
> > just files.
> >
> > If a patch is fed to get_maintainer then any files modified by
> > the patch are _not_ scanned.
> >
> > Only the patch _content_ is used for keyword matches.
> >
>
> Got it. I'll roll your patch into a v3.
>

Actually, I have a slightly improved patch as
the actual keyword is shown too.

I'll get it uploaded and make sure you are credited
with the effort to make the change.

cheers, Joe

2023-10-05 20:11:26

by Justin Stitt

[permalink] [raw]
Subject: Re: [PATCH] get_maintainer/MAINTAINERS: confine K content matching to patches

On Thu, Oct 5, 2023 at 1:05 PM Joe Perches <[email protected]> wrote:
>
> On Thu, 2023-10-05 at 12:52 -0700, Justin Stitt wrote:
> > On Thu, Oct 5, 2023 at 11:42 AM Joe Perches <[email protected]> wrote:
> > >
> > > On Thu, 2023-10-05 at 11:30 -0700, Justin Stitt wrote:
> > > > On Thu, Oct 5, 2023 at 11:15 AM Joe Perches <[email protected]> wrote:
> > > > >
> > > > > On Thu, 2023-10-05 at 11:06 -0700, Justin Stitt wrote:
> > > > > > On Wed, Oct 4, 2023 at 7:40 PM Joe Perches <[email protected]> wrote:
> > > > > > >
> > > > > > > On Wed, 2023-10-04 at 21:21 +0000, Justin Stitt wrote:
> > > > > > > > The current behavior of K: is a tad bit noisy. It matches against the
> > > > > > > > entire contents of files instead of just against the contents of a
> > > > > > > > patch.
> > > > > > > >
> > > > > > > > This means that a patch with a single character change (fixing a typo or
> > > > > > > > whitespace or something) would still to/cc maintainers and lists if the
> > > > > > > > affected file matched against the regex pattern given in K:. For
> > > > > > > > example, if a file has the word "clang" in it then every single patch
> > > > > > > > touching that file will to/cc Nick, Nathan and some lists.
> > > > > > > >
> > > > > > > > Let's change this behavior to only content match against patches
> > > > > > > > (subjects, message, diff) as this is what most people expect the
> > > > > > > > behavior already is. Most users of "K:" would prefer patch-only content
> > > > > > > > matching. If this is not the case let's add a new matching type as
> > > > > > > > proposed in [1].
> > > > > > >
> > > > > > > I'm glad to know you are coming around to my suggestion.
> > > > > > :)
> > > > > >
> > > > > > >
> > > > > > > I believe the file-based keyword matching should _not_ be
> > > > > > > removed and the option should be added for it like I suggested.
> > > > > >
> > > > > > Having a command line flag allowing get_maintainer.pl
> > > > > > users to decide the behavior of K: is weird to me. If I'm a maintainer setting
> > > > > > my K: in MAINTAINERS I want some sort of consistent behavior. Some
> > > > > > patches will start hitting mailing list that DO have keywords in the patch
> > > > > > and others, confusingly, not.
> > > > >
> > > > > Not true.
> > > > >
> > > > > If a patch contains a keyword match, get_maintainers will _always_
> > > > > show the K: keyword maintainers unless --nokeywords is specified
> > > > > on the command line.
> > > >
> > > > ...
> > > >
> > > > >
> > > > > If a file contains a keyword match, it'll only show the K:
> > > > > keyword if --keywords-in-file is set.
> > > >
> > > > Right, what I'm saying is a patch can arrive in a maintainer's inbox
> > > > wherein the patch itself has no mention of the keyword (if
> > > > get_maintainer user opted for --keywords-in-file). Just trying to
> > > > avoid some cases of the question: "Why is this in my inbox?"
> > >
> > > Because the script user specifically asked for it.
> > >
> > > > > > To note, we get some speed-up here as pattern matching a patch that
> > > > > > touches lots of files would result in searching all of them in their
> > > > > > entirety. Just removing this behavior _might_ have a measurable
> > > > > > speed-up for patch series touching dozens of files.
> > > > >
> > > > > Again, not true.
> > > > >
> > > > > Patches do _not_ scan the original modified files for keyword matches.
> > > > > Only the patch itself is scanned. That's the current behavior as well.
> > > > >
> > > >
> > > > Feel like I'm missing something here. How is K: matching keywords in
> > > > files without reading them.
> > > >
> > > > If my patch touches 10 files then all 10 of those files are scanned for
> > > > K: matches right?
> > >
> > > Nope.
> > >
> > > Understand the patches are the input to get_maintainer and not
> > > just files.
> > >
> > > If a patch is fed to get_maintainer then any files modified by
> > > the patch are _not_ scanned.
> > >
> > > Only the patch _content_ is used for keyword matches.
> > >
> >
> > Got it. I'll roll your patch into a v3.
> >
>
> Actually, I have a slightly improved patch as
> the actual keyword is shown too.
>
> I'll get it uploaded and make sure you are credited
> with the effort to make the change.
>

Dang, we just collided in mid-air. I just sent a new patch.
Let's disregard my patch that was sent.

Thanks for the efforts here. I appreciate it.

> cheers, Joe

Thanks
Justin