Received: by 2002:a05:6a10:6744:0:0:0:0 with SMTP id w4csp5780461pxu; Thu, 22 Oct 2020 10:50:11 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyYGCNkhDuVsnSaeTS+AT2Z+78gf/J+Co/+l9HKJh+rwNeiDysiVZBHBtZ3oqt/DiA6Vebf X-Received: by 2002:a17:906:5e49:: with SMTP id b9mr3341295eju.436.1603389011064; Thu, 22 Oct 2020 10:50:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1603389011; cv=none; d=google.com; s=arc-20160816; b=RDHrbiBD4wPkmmWEXaaaqJhyUwfH4wT+MDFcTBLP39ZRw4k8zDDOrpa4JXRbkEE3et xo+s9Vy8W9MhjZ+FAE8DrwWtTHtjNFcUgzk5sTRyxOQ1SvoCr+uvpIAkC5RkfTxtq1bm 4yqmV/dluD8obuTxe120CHH9VScUg7w6rkyaN26NIUrMEyb8RwkNttdsvrVVH+1RDDJ1 9oFhF7NE01HzPJ0ji1H6QHcdJTRYeIDrLjDqRtsGyT/IE5hlAHmW4Vjh3QZDyEZzIixs EdjkLYDJjxRrMaXIg3R+kJvximYafSwmcE/5h2N2GIpWm9dvRwLSXzlCj2zyuum6fkXb bbfw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject:dkim-signature; bh=WD0jtBNTjlU9AN49fhz5jjiyP58Uz1jlaJhU9PAdtjM=; b=h6e9Z2fmlCT47tYCXmnzjEkyNgQskTXnJJ8VjjFeoVa7g1obo+X88ioID2XTEacmGu T74WjLh1TaAtpKJYKCq2J0q2Ti4L4DQAX9s4q5MX2SebUWNR9m1K4sS1N7pqVONYGbU5 vzsyb+J/D3sN3YapNnhe+yPi5u+RHivYjngc6VMB78OlLG5sm81aaBHRVTHofl3S344X /RYw6vVekK5LMT+RKCW9nPu/OBCcB8nmxRTDm5J8NqqfHPB0isws50XQAjz5wmZSoVvE CSU1Oe6dPbqEZg24Uwx3SoK/6kVHUc1XnoLTLDbbRSRHMxazt1/jFuaPO4fXBBbmPpWE +4yg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=dcCPde1C; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id p14si1446991eju.470.2020.10.22.10.49.48; Thu, 22 Oct 2020 10:50:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=dcCPde1C; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S368302AbgJVO6S (ORCPT + 99 others); Thu, 22 Oct 2020 10:58:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37830 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S368201AbgJVO6S (ORCPT ); Thu, 22 Oct 2020 10:58:18 -0400 Received: from mail-pl1-x641.google.com (mail-pl1-x641.google.com [IPv6:2607:f8b0:4864:20::641]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6A725C0613CE for ; Thu, 22 Oct 2020 07:58:18 -0700 (PDT) Received: by mail-pl1-x641.google.com with SMTP id w11so1087322pll.8 for ; Thu, 22 Oct 2020 07:58:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=WD0jtBNTjlU9AN49fhz5jjiyP58Uz1jlaJhU9PAdtjM=; b=dcCPde1CvGRMz3wM7wYha4dqt3k7ghjEmbmwUrBkqVEzqFi9K4UX/sszrLKlLCEdjt OgVgu1x8IXbs6uqlDQzISq8JMv+PuwNHidl27bE/2C66q2NnEhBFdc7eBTcZnJTNeRMA 1PEgEwX+HGZqDxwypiyGUe+HoF5+41CAq4ZAFAvvDwKs3+kHsqMolsFCXFT+vq+tUyKx 6X93LSWyzHbcjHZWed9Ub5YRag+TodepPKSMDnE4R3sCfQEu+wsR7IqUiYiv+gnF+COE ++F/v85PIHDcu9U3U6jPnHusikRmF+Xe51R1wO3BJdOUShqhNAARaoFT9VhnnRY9dATr HY5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=WD0jtBNTjlU9AN49fhz5jjiyP58Uz1jlaJhU9PAdtjM=; b=Vinq/gNGCzBLCbeD8+qSnnL+9/jwly5+Ux3Px+leQy2lllLo9wZZIqzTpXIur7yiQT k+0qOTS8GWDkm4vSTx30ng0tMeQxFEALdaWU9q/i4tclqXkIbptD5ws9qglNsMdD8GJ6 h5+aB++lH1PNiX6y9v5yBJ2VxrqJd8ZzGLqPIJc3AsfqBEB5ti+r8aAIkH5NhaTvYsde 5YncteANtlYbtjtI1RS8GybG6wK4K8xOLcTC5JIB43LSOfOl/kvlFJ3azdLF1LPS3Tl1 BeqEW1CL1Iam7P4AIkg4+KWkFcmIqPlSijf7fTKgmzOZ33tKuOz+gs+Gngq1SB0JihDC KQJg== X-Gm-Message-State: AOAM5325x97mSuZc8YVG51kmpqP70Roj3yizhQtA0aEPnnxjTy2KqFwG 2AMfvBuw3MnAXVBkjxEM0U8= X-Received: by 2002:a17:902:708a:b029:d4:cf7c:6c59 with SMTP id z10-20020a170902708ab02900d4cf7c6c59mr3103971plk.52.1603378697733; Thu, 22 Oct 2020 07:58:17 -0700 (PDT) Received: from ?IPv6:2402:3a80:41a:7419:e1bd:6bc1:d06a:efd1? ([2402:3a80:41a:7419:e1bd:6bc1:d06a:efd1]) by smtp.gmail.com with ESMTPSA id b5sm2607693pfo.64.2020.10.22.07.58.14 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 22 Oct 2020 07:58:17 -0700 (PDT) Subject: Re: [PATCH v2] checkpatch: fix false positives in REPEATED_WORD warning To: joe@perches.com Cc: linux-kernel@vger.kernel.org, linux-kernel-mentees@lists.linuxfoundation.org, lukas.bulwahn@gmail.com, dwaipayanray1@gmail.com References: <20201022145021.28211-1-yashsri421@gmail.com> From: Aditya Message-ID: <1de93783-1faf-95b9-c8bd-2f2bb2f8a010@gmail.com> Date: Thu, 22 Oct 2020 20:28:11 +0530 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: <20201022145021.28211-1-yashsri421@gmail.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 22/10/20 8:20 pm, Aditya Srivastava wrote: > Presence of hexadecimal address or symbol results in false warning > message by checkpatch.pl. > > For example, running checkpatch on commit b8ad540dd4e4 ("mptcp: fix > memory leak in mptcp_subflow_create_socket()") results in warning: > > WARNING:REPEATED_WORD: Possible repeated word: 'ff' > 00 00 00 00 00 00 00 00 00 2f 30 0a 81 88 ff ff ........./0..... > > Similarly, the presence of list command output in commit results in > an unnecessary warning. > > For example, running checkpatch on commit 899e5ffbf246 ("perf record: > Introduce --switch-output-event") gives: > > WARNING:REPEATED_WORD: Possible repeated word: 'root' > dr-xr-x---. 12 root root 4096 Apr 27 17:46 .. > > Here, it reports 'ff' and 'root to be repeated, but it is in fact part > of some address or code, where it has to be repeated. > > In these cases, the intent of the warning to find stylistic issues in > commit messages is not met and the warning is just completely wrong in > this case. > > To avoid these warnings, add additional regex check for the > directory permission pattern and avoid checking the line for this > class of warning. Similarly, to avoid hex pattern, check if the word > consists of hex symbols and skip this warning if, it forms a pattern > of repeating sequence or contains more special character after pattern > than non hex characters. > > A quick evaluation on v5.6..v5.8 showed that this fix reduces > REPEATED_WORD warnings from 2797 to 907. > > A quick manual check found all cases are related to hex output or > list command outputs in commit messages. > > Signed-off-by: Aditya Srivastava > --- > scripts/checkpatch.pl | 33 ++++++++++++++++++++++++++++++++- > 1 file changed, 32 insertions(+), 1 deletion(-) > > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl > index 9b9ffd876e8a..0f57e99ed670 100755 > --- a/scripts/checkpatch.pl > +++ b/scripts/checkpatch.pl > @@ -3051,7 +3051,10 @@ sub process { > } > > # check for repeated words separated by a single space > - if ($rawline =~ /^\+/ || $in_commit_log) { > +# avoid false positive from list command eg, '-rw-r--r-- 1 root root' > + if (($rawline =~ /^\+/ || $in_commit_log) && > + $rawline !~ /[bcCdDlMnpPs\?-][rwxsStT-]{9}/) { > + my @hex_seq = $rawline =~ /\b[0-9a-f]{2,} \b/g; > while ($rawline =~ /\b($word_pattern) (?=($word_pattern))/g) { > > my $first = $1; > @@ -3065,6 +3068,34 @@ sub process { > next if ($first ne $second); > next if ($first eq 'long'); > > + # avoid repeating hex occurrences like 'ff ff fe 09 ...' > + if ($first =~ /\b[0-9a-f]{2,}/) { > + # if such sequence occurs more than 4, it is most probably part of some of code > + next if ((scalar @hex_seq)>4); > + > + # for hex occurrences which are less than 4 > + # get first hex word in the line > + if ($rawline =~ /\b[0-9a-f]{2,} /) { > + my $post_hex_seq = $'; > + > + # set suffieciently high default values to avoid ignoring or counting in absence of another > + my $non_hex_char_pos = 1000; > + my $special_chars_pos = 500; > + > + if ($post_hex_seq =~ /[g-z]+/) { > + # first non hex character in post_hex_seq > + $non_hex_char_pos = $-[0]; > + } > + if($post_hex_seq =~ /[^a-zA-Z0-9]{2,}/) { > + # first occurrence of 2 or more special chars > + $special_chars_pos = $-[0]; > + } > + > + # as such occurrences are not common, it is more likely to be a part of some code > + next if ($special_chars_pos<$non_hex_char_pos); > + } > + } > + > if (WARN("REPEATED_WORD", > "Possible repeated word: '$first'\n" . $herecurr) && > $fix) { > I have also generated some reports showing impact caused by the changes: Hex variations causing warning previously: https://github.com/AdityaSrivast/kernel-tasks/blob/master/Task3/hex_variations.txt Change in warnings and errors: https://github.com/AdityaSrivast/kernel-tasks/blob/master/Task3/relative_summary/summary_relative.txt Warnings which are dropped: https://github.com/AdityaSrivast/kernel-tasks/blob/master/Task3/relative_summary/dropped_warnings/summary.txt Thanks Aditya