Received: by 2002:a05:6359:c8b:b0:c7:702f:21d4 with SMTP id go11csp1626320rwb; Sun, 18 Sep 2022 10:47:49 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6Rdh/7bPb2Y3H3ZUzL8NuwiAV+graRYa2YMv2bWNLV4JfmAAfPTfjwIiCnAPkJSzKGbIIw X-Received: by 2002:a17:902:b213:b0:178:9c90:c4d3 with SMTP id t19-20020a170902b21300b001789c90c4d3mr2785972plr.151.1663523268937; Sun, 18 Sep 2022 10:47:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1663523268; cv=none; d=google.com; s=arc-20160816; b=wVdEQbpmTMLybFBOTXx8pKb2n9/BmNvs2x+lij5wcUAGZLEMJ/UBTa2Tqm2k3aWxFD +057jjNkPOVA4I4w02jKBi//5nCuVUmrMZkLdLQdQys+z26S/RrFY5sOnWzF3ulaR3wy GccDslGRmDcf9b5IFwPFSLPssgDOsDWSEn5++eWaXIaMsi6FaIz427z1kGndTOHaIAj4 /4qQJRLP33eNiP8osXMkqvYwLE9ckyfQOANI8zxP0ipiAMKHi2S+HyMAbddm0zKO1jGt Vc0n/hkeYay2//sgFrXvq5epkkQA4oDsxnsdZ1JvFnhUJ9BkoM/TrJo3am7AdpDqLfHi M+1A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent :content-transfer-encoding:references:in-reply-to:date:cc:to:from :subject:message-id; bh=2i4fGni6a0kiLRIZaOUPlD1kbN4/bi8hUedJ4UZyWr8=; b=HOuqttlgB9CI5+k8TyUFeinJSZUWunzmpdFFbC7YwV+vPZiHrFj+Lbm9emrWJR/6I0 0EzPMvV4oHKCdTjFRVYJUzNBeUKSrQs9/Al9SiexbUjKJClyu7af3gdlGBP6xyoZWH0X M/9IdQUDo++Lb2vhfZkyaZGg91v+kZzMl4CeGsAao2NCceLGS21AMrwOb7AjKmSVke3E JzCaNxxGL8d3D5l2waeXuccTALNxF+cDJpB0JKriEG9TwXRjDdK8velIepTmtnVglS7s 3QypudSs7bhl+Dg0QVCMWBTGuUtqbRAgIDHyP/0pYducWKnYlD85T72U5k2DqZ9HMIWj LTug== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a4-20020a62d404000000b0050ad2c9d507si26130158pfh.170.2022.09.18.10.47.19; Sun, 18 Sep 2022 10:47:48 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229518AbiIRRDX convert rfc822-to-8bit (ORCPT + 99 others); Sun, 18 Sep 2022 13:03:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52686 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229458AbiIRRDW (ORCPT ); Sun, 18 Sep 2022 13:03:22 -0400 Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 18BFA193F3 for ; Sun, 18 Sep 2022 10:03:21 -0700 (PDT) Received: from omf12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 243F0C0FE5; Sun, 18 Sep 2022 17:03:20 +0000 (UTC) Received: from [HIDDEN] (Authenticated sender: joe@perches.com) by omf12.hostedemail.com (Postfix) with ESMTPA id 1E69619; Sun, 18 Sep 2022 17:03:16 +0000 (UTC) Message-ID: <92afdf33e22e8a63f6baaaba94c004cf2ec5a7d7.camel@perches.com> Subject: Re: [PATCH] get_maintainer: Extend matched name characters in maintainers_in_file() From: Joe Perches To: Janne Grunau Cc: linux-kernel@vger.kernel.org, Martin =?UTF-8?Q?Povi=C5=A1er?= Date: Sun, 18 Sep 2022 10:03:17 -0700 In-Reply-To: References: <20220916084712.84411-1-j@jannau.net> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8BIT User-Agent: Evolution 3.44.4 (3.44.4-1.fc36) MIME-Version: 1.0 X-Stat-Signature: bgbfofmr8xqgmp8gqjkx4rym6k7e5s8k X-Rspamd-Server: rspamout08 X-Rspamd-Queue-Id: 1E69619 X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00,FORGED_SPF_HELO, RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,SPF_NONE,UNPARSEABLE_RELAY autolearn=no autolearn_force=no version=3.4.6 X-Session-Marker: 6A6F6540706572636865732E636F6D X-Session-ID: U2FsdGVkX1/EyBIr8EKSI9ghnUkkURiAbkzvP6k+Iqc= X-HE-Tag: 1663520595-881005 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, 2022-09-17 at 07:11 -0700, Joe Perches wrote: > On Fri, 2022-09-16 at 10:47 +0200, Janne Grunau wrote: > > Extend the regexp matching name characters to cover Unicode blocks Latin > > Extended-A and Extended-B. > > Fixes 'scripts/get_maintainer.pl -f' for > > 'Documentation/devicetree/bindings/clock/apple,nco.yaml'. > > > > Signed-off-by: Janne Grunau > > > > --- > > This still excludes Greek and Cyrilic characters which should be > > expected in names as well. I tried to use '\p{L}' to match all Unicode > > letters but couldn't get it to work. Feel free understand this as bug > > report with an incomplete fix. > > Maybe use \p{XPosixAlpha} ? > > but I don't know what version of perl introduced this. > > > diff --git a/scripts/get_maintainer.pl b/scripts/get_maintainer.pl > [] > > @@ -442,7 +442,7 @@ sub maintainers_in_file { > > my $text = do { local($/) ; <$f> }; > > close($f); > > > > - my @poss_addr = $text =~ m$[A-Za-zÀ-ÿ\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g; > > + my @poss_addr = $text =~ m$[A-Za-zÀ-ɏ\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g; > > my @poss_addr = $text =~ m$[\p{XPosixAlpha}\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g; Using variations of \p{posix} doesn't seem to work for at least perl 5.34. \p{print} seems to work for Documentation/devicetree/bindings/clock/apple,nco.yaml, but I don't know how fragile it is. \p{print} might be too greedy... --- scripts/get_maintainer.pl | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/scripts/get_maintainer.pl b/scripts/get_maintainer.pl index ab123b498fd9..790112c3e1d7 100755 --- a/scripts/get_maintainer.pl +++ b/scripts/get_maintainer.pl @@ -442,7 +442,7 @@ sub maintainers_in_file { my $text = do { local($/) ; <$f> }; close($f); - my @poss_addr = $text =~ m$[A-Za-zÀ-ÿ\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g; + my @poss_addr = $text =~ m$[\p{print}\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g; push(@file_emails, clean_file_emails(@poss_addr)); } } @@ -2456,11 +2456,12 @@ sub clean_file_emails { foreach my $email (@file_emails) { $email =~ s/[\(\<\{]{0,1}([A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+)[\)\>\}]{0,1}/\<$1\>/g; my ($name, $address) = parse_email($email); + $name =~ s/^\p{space}*\p{punct}*\p{space}*//; if ($name eq '"[,\.]"') { $name = ""; } - my @nw = split(/[^A-Za-zÀ-ÿ\'\,\.\+-]/, $name); + my @nw = split(/[^\p{print}\'\,\.\+-]/, $name); if (@nw > 2) { my $first = $nw[@nw - 3]; my $middle = $nw[@nw - 2];