Received: by 10.192.165.148 with SMTP id m20csp5515380imm; Wed, 9 May 2018 06:21:14 -0700 (PDT) X-Google-Smtp-Source: AB8JxZpz1BO52xnIj7rw2pmrcjQkh4VJ80ZZRD4zE5DTUo5+La1+gXVZRb7qLd3eX3gXrP1u4xHC X-Received: by 2002:a65:6341:: with SMTP id p1-v6mr15842549pgv.85.1525872074212; Wed, 09 May 2018 06:21:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525872074; cv=none; d=google.com; s=arc-20160816; b=P9ZKpzMY8R0E7iCD5SCkgF5kE2kl1UuG8YiAxghTIUcx9BFrE2vXUbLCYgKdyUNmM1 ahvFS9rchC8pD/ey8Gey6zTyTi1fdKLcvS2EUT/67Jl4bCO0iZ+1GaL4KjMkGwmUTYvM 68YuKIvwxgPEKCInOr/l5CVwQd/tbqvjIPRzGtR6RwSFCYVlIle1aqRnxAG5bYEBZnJ2 muXDKtxAaxcLtZJMVvPw9np5X3Ebg7qSBPkz6ii7EkjxXKZbnHYQl1GO2GYRQrXJW4AN tXM1Uk3Cow8AObXUaOYn+78qTVlB9c7W5G91sbhUDyajgjsHj5Z4yHwEY1p5zDVkqZTp bw+A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:references :in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=FcuY0aoeEuUaCdVirKTOdNXVjaDkWq8adT/6hlXc868=; b=kucmpWn6JvxpFzg57se/fYazU7jwadQmbYyjQ7umHdon3okpYczQvoTD0EDq2UjgzH UDT+65oVefUX1qjQEMOBoPV6jbC2GrI3DDGOfPpCBGCbFOU8zyP/40nfIUOZkRvVPcD3 MS3njzZj3h6ND3IiZtszDAzqs9uDEZf5zTeZJm5WEFNfKf4gVKCr8X4e8A7RtBP785uN rL6aqYT++xkTnB9zJZmMx5ZsiXzBaUBHuwiWQczyzZ6Wq/SPTo1kw6OK1GOvNHLCm/5K G56Srw5KWo8mQQhpW4E0meyT6W/R3S56PYYiqVA8YMPri9zHhomVvy25FqA8TVE/qjxA Q5Tg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c11-v6si14700889pls.76.2018.05.09.06.20.59; Wed, 09 May 2018 06:21:14 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935291AbeEINTi (ORCPT + 99 others); Wed, 9 May 2018 09:19:38 -0400 Received: from osg.samsung.com ([64.30.133.232]:60030 "EHLO osg.samsung.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S935219AbeEINTS (ORCPT ); Wed, 9 May 2018 09:19:18 -0400 Received: from localhost (localhost [127.0.0.1]) by osg.samsung.com (Postfix) with ESMTP id 889222C26A; Wed, 9 May 2018 06:19:17 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at dev.s-opensource.com X-Amavis-Alert: BAD HEADER SECTION, Duplicate header field: "References" Received: from osg.samsung.com ([127.0.0.1]) by localhost (localhost [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bpluf1_MX-mx; Wed, 9 May 2018 06:19:16 -0700 (PDT) Received: from smtp.s-opensource.com (177.41.96.165.dynamic.adsl.gvt.net.br [177.41.96.165]) by osg.samsung.com (Postfix) with ESMTPSA id D60B22C22A; Wed, 9 May 2018 06:19:10 -0700 (PDT) Received: from mchehab by smtp.s-opensource.com with local (Exim 4.90_1) (envelope-from ) id 1fGOzs-0004OQ-QU; Wed, 09 May 2018 10:19:08 -0300 From: Mauro Carvalho Chehab To: Linux Doc Mailing List Cc: Mauro Carvalho Chehab , Mauro Carvalho Chehab , linux-kernel@vger.kernel.org, Jonathan Corbet Subject: [PATCH v2 06/11] scripts/documentation-file-ref-check: rewrite it in perl with auto-fix mode Date: Wed, 9 May 2018 10:18:49 -0300 Message-Id: X-Mailer: git-send-email 2.17.0 In-Reply-To: References: In-Reply-To: References: Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The original shell script works, but: 1) it is too slow; 2) it is hard to exclude rejex patterns Convert it to perl. Here, the new version is able to check the entire tree in less than a second (after cached): real 0m0,284s user 0m0,668s sys 0m0,778s The old version takes more than a minute to complete (also after cached): real 1m17,905s user 0m25,583s sys 0m55,334s It also produce less false-positives (if any). The new script also contains an auto-fix mode. Usually, file references get lost when they're moved to some other place and/or renamed to .rst. Add an experimental mode to auto-fix those. Signed-off-by: Mauro Carvalho Chehab --- scripts/documentation-file-ref-check | 125 ++++++++++++++++++++++++--- 1 file changed, 113 insertions(+), 12 deletions(-) diff --git a/scripts/documentation-file-ref-check b/scripts/documentation-file-ref-check index bc1659900e89..2520bc14ffac 100755 --- a/scripts/documentation-file-ref-check +++ b/scripts/documentation-file-ref-check @@ -1,15 +1,116 @@ -#!/bin/sh +#!/usr/bin/env perl +# SPDX-License-Identifier: GPL-2.0 +# # Treewide grep for references to files under Documentation, and report # non-existing files in stderr. -for f in $(git ls-files); do - for ref in $(grep -ho "Documentation/[A-Za-z0-9_.,~/*+-]*" "$f"); do - # presume trailing . and , are not part of the name - ref=${ref%%[.,]} - - # use ls to handle wildcards - if ! ls $ref >/dev/null 2>&1; then - echo "$f: $ref" >&2 - fi - done -done +use warnings; +use strict; +use Getopt::Long qw(:config no_auto_abbrev); + +my $scriptname = $0; +$scriptname =~ s,.*/([^/]+/),$1,; + +# Parse arguments +my $help = 0; +my $fix = 0; + +GetOptions( + 'fix' => \$fix, + 'h|help|usage' => \$help, +); + +if ($help != 0) { + print "$scriptname [--help] [--fix-rst]\n"; + exit -1; +} + +# Step 1: find broken references +print "Finding broken references. This may take a while... " if ($fix); + +my %broken_ref; + +open IN, "git grep 'Documentation/'|" + or die "Failed to run git grep"; +while () { + next if (!m/^([^:]+):(.*)/); + + my $f = $1; + my $ln = $2; + + # Makefiles contain nasty expressions to parse docs + next if ($f =~ m/Makefile/); + # Skip this script + next if ($f eq $scriptname); + + if ($ln =~ m,\b(\S*)(Documentation/[A-Za-z0-9\_\.\,\~/\*+-]*),) { + my $prefix = $1; + my $ref = $2; + my $base = $2; + + $ref =~ s/[\,\.]+$//; + + my $fulref = "$prefix$ref"; + + $fulref =~ s/^(\ 1) { + print STDERR "WARNING: Won't auto-replace, as found multiple files close to $ref:\n"; + foreach my $j (@find) { + $j =~ s,^./,,; + print STDERR " $j\n"; + } + } else { + $f = $find[0]; + $f =~ s,^./,,; + print "INFO: Replacing $ref to $f\n"; + foreach my $j (qx(git grep -l $ref)) { + qx(sed "s\@$ref\@$f\@g" -i $j); + } + } +} -- 2.17.0