Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754615AbbLDBjx (ORCPT ); Thu, 3 Dec 2015 20:39:53 -0500 Received: from mail-pa0-f46.google.com ([209.85.220.46]:35057 "EHLO mail-pa0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753487AbbLDBjw (ORCPT ); Thu, 3 Dec 2015 20:39:52 -0500 Date: Thu, 3 Dec 2015 17:39:48 -0800 From: Brian Norris To: Joe Perches Cc: Andy Whitcroft , linux-kernel@vger.kernel.org Subject: Re: [BUG] checkpatch: false positive for commits with quote characters Message-ID: <20151204013948.GB116589@google.com> References: <20151116224321.GR8456@google.com> <1447782507.6012.22.camel@perches.com> <20151117180336.GE8456@google.com> <20151204001318.GA57739@google.com> <1449188980.17296.18.camel@perches.com> <1449189388.17296.20.camel@perches.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1449189388.17296.20.camel@perches.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4030 Lines: 105 On Thu, Dec 03, 2015 at 04:36:28PM -0800, Joe Perches wrote: > On Thu, 2015-12-03 at 16:29 -0800, Joe Perches wrote: > > On Thu, 2015-12-03 at 16:13 -0800, Brian Norris wrote: > > > Ping? I've hit some different false positives today on the same rule. > > > I'll stop bothering to report them if no one cares. > > > > Perhaps this: Aside from the non-breaking spaces your copy of Evolution inserted, it's an improvement. (It doesn't give a false positive for the patch I reported.) But it's still got some holes. > (minus the debugging this time...) > --- > ?scripts/checkpatch.pl | 8 ++++---- > ?1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl > index d4960f7..b23dff8 100755 > --- a/scripts/checkpatch.pl > +++ b/scripts/checkpatch.pl > @@ -2397,20 +2397,20 @@ sub process { > ? $long = 1 if ($line =~ /\bcommit\s+[0-9a-f]{41,}/i); > ? $space = 0 if ($line =~ /\bcommit [0-9a-f]/i); > ? $case = 0 if ($line =~ /\b[Cc]ommit\s+[0-9a-f]{5,40}[^A-F]/); > - if ($line =~ /\bcommit\s+[0-9a-f]{5,}\s+\("([^"]+)"\)/i) { > + if ($line =~ /\bcommit\s+[0-9a-f]{5,}\s+\("(.*)"\)/i) { You're got the reverse problem now. This will match too broadly. For instance, if there is another instance of terminating quote + parentheses on the same line, then we'll get a false postive. e.g., if I wrote a patch that included something like this: Commit 6dc0dcde406b ("parisc: Use platform_device_register_simple("rtc-generic")") is cool (as in "cool") Or even if it's wrapped like this: Commit 6dc0dcde406b ("parisc: Use platform_device_register_simple("rtc-generic")") is cool (as in "cool") Then the regexes will think the description was: parisc: Use platform_device_register_simple("rtc-generic")") is cool (as in "cool A hacky workaround for that one: only check for (loosely, not proper regex syntax): parsed_description =~ description . ")"; rather than: description == parsed_description Not sure how far you want to go on chasing false positives... > ? $orig_desc = $1; > ? $hasparens = 1; > ? } elsif ($line =~ /\bcommit\s+[0-9a-f]{5,}\s*$/i && > ? ?defined $rawlines[$linenr] && > - ?$rawlines[$linenr] =~ /^\s*\("([^"]+)"\)/) { > + ?$rawlines[$linenr] =~ /^\s*\("(.*)"\)/) { > ? $orig_desc = $1; > ? $hasparens = 1; > ? } elsif ($line =~ /\bcommit\s+[0-9a-f]{5,}\s+\("[^"]+$/i && > ? ?defined $rawlines[$linenr] && > - ?$rawlines[$linenr] =~ /^\s*[^"]+"\)/) { > + ?$rawlines[$linenr] =~ /^\s*.*"\)/) { > ? $line =~ /\bcommit\s+[0-9a-f]{5,}\s+\("([^"]+)$/i; > ? $orig_desc = $1; > - $rawlines[$linenr] =~ /^\s*([^"]+)"\)/; > + $rawlines[$linenr] =~ /^\s*(.*)"\)/; > ? $orig_desc .= " " . $1; > ? $hasparens = 1; > ? } BTW, another false positive: just including this text in a commit triggers a different one: http://lkml.kernel.org/g/20151113220039.GA74382@google.com A simple hack (appended, in addition to yours) would be to assume that when people are trying to include badly-formatted commit hashes, they will be preceding them with whitespace, the beginning of a line, or encapsulating punctuation. Or use a URL parser. Brian diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index d2993f19df3f..e7110ba3242b 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -2349,7 +2349,7 @@ sub process { # Check for git id commit length and improperly formed commit descriptions if ($in_commit_log && !$commit_log_possible_stack_dump && ($line =~ /\bcommit\s+[0-9a-f]{5,}\b/i || - ($line =~ /\b[0-9a-f]{12,40}\b/i && + ($line =~ /\b[\s^\("][0-9a-f]{12,40}\b/i && $line !~ /[\<\[][0-9a-f]{12,40}[\>\]]/i && $line !~ /\bfixes:\s*[0-9a-f]{12,40}/i))) { my $init_char = "c"; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/