Received: by 2002:a05:6359:c8b:b0:c7:702f:21d4 with SMTP id go11csp413357rwb; Sat, 17 Sep 2022 07:27:28 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4/f6xR350750o2H21QomO6pfolKJr+JjzuXLlVr+VLSZdRIbgkG9bUL+b7VFKBxts2nnY5 X-Received: by 2002:a17:906:cc50:b0:777:5a19:2264 with SMTP id mm16-20020a170906cc5000b007775a192264mr6758248ejb.130.1663424847979; Sat, 17 Sep 2022 07:27:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1663424847; cv=none; d=google.com; s=arc-20160816; b=LYJq00xhN019KoOD+RAolsf3eTu/kXuF0SZd+9bnVHnGQIEfodzxSz5YRYcV+JpSXH 1IlbIpIFPcuFOWGwWPgIzlc6wQ0P0XngNWw1SpYze6YcYZeEv8fyUqySx5xCPKy35J46 ecrQOA6F0nslxyCRa5l+9NQ1LK2+8qzbP90ZXz1voTVUDrn6uf1aFa9B5gALm6U2J6nV Eqt+7eDsN0tqd8Rh/hlNT126/+NVXiKwKiBS8nDYF8A6B4ziQd9WGd55VVgrOK7bUcuq YEEuTegItEJHMwwfbP20RgpnXXIPr6Nw4HPAbLDZojLQZy8heHi0nqJkUhchW6nnuN2X sgNw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent :content-transfer-encoding:references:in-reply-to:date:cc:to:from :subject:message-id; bh=eRG6MaxRVjxyHbXjmRpAgfxYsgAkqySFLjuAmoUaaIQ=; b=KplwE4c/1QXQtZZNbbNzMsToT9u5WAVXnUZ+H3S1e8o0lb1CRJIzF1/vrfB9T9K0wd V71tXhjUrTDjRD+2Yq8kjcAcIZsLpYEu9y+jL7L/FDaYaXwCDl+zIDVnv/Qak6o7B5Lf yBOETfm2PzWvxWCP/wJQBKiob4DCcK0/SjQQDTshAMeZiZzQ8OqsB9D0R0n5t4uI6tKB A3eRfNyoqp7hhhY1QV8WRQAVJ8wU1oOsBqc88UJa8sTJixKe67bFVDdAWqEZwwXPz2y4 jMijwjrM9qffNUd+W7XC+YxM6FmaNJV3rj5ImBy1SRnxcmOd5gi1OhIAUNde/n3Z9SXI exyg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hz2-20020a1709072ce200b00780def41dc4si2055898ejc.527.2022.09.17.07.26.51; Sat, 17 Sep 2022 07:27:27 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229657AbiIQOMB convert rfc822-to-8bit (ORCPT + 99 others); Sat, 17 Sep 2022 10:12:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42706 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229655AbiIQOL7 (ORCPT ); Sat, 17 Sep 2022 10:11:59 -0400 Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4B84C32BAF for ; Sat, 17 Sep 2022 07:11:56 -0700 (PDT) Received: from omf08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 2CC67A098E; Sat, 17 Sep 2022 14:11:55 +0000 (UTC) Received: from [HIDDEN] (Authenticated sender: joe@perches.com) by omf08.hostedemail.com (Postfix) with ESMTPA id CBF4520029; Sat, 17 Sep 2022 14:11:52 +0000 (UTC) Message-ID: Subject: Re: [PATCH] get_maintainer: Extend matched name characters in maintainers_in_file() From: Joe Perches To: Janne Grunau Cc: linux-kernel@vger.kernel.org Date: Sat, 17 Sep 2022 07:11:52 -0700 In-Reply-To: <20220916084712.84411-1-j@jannau.net> References: <20220916084712.84411-1-j@jannau.net> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8BIT User-Agent: Evolution 3.44.4 (3.44.4-1.fc36) MIME-Version: 1.0 X-Spam-Status: No, score=-0.9 required=5.0 tests=BAYES_00,FORGED_SPF_HELO, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_PASS, SPF_NONE,UNPARSEABLE_RELAY autolearn=no autolearn_force=no version=3.4.6 X-Stat-Signature: a3wde81h37fa11brnunp4cuarrmyiwtx X-Rspamd-Server: rspamout06 X-Rspamd-Queue-Id: CBF4520029 X-Session-Marker: 6A6F6540706572636865732E636F6D X-Session-ID: U2FsdGVkX1+qnfjlUuy4feuD6GX19kec/WEMt4lpGCs= X-HE-Tag: 1663423912-80121 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 2022-09-16 at 10:47 +0200, Janne Grunau wrote: > Extend the regexp matching name characters to cover Unicode blocks Latin > Extended-A and Extended-B. > Fixes 'scripts/get_maintainer.pl -f' for > 'Documentation/devicetree/bindings/clock/apple,nco.yaml'. > > Signed-off-by: Janne Grunau > > --- > This still excludes Greek and Cyrilic characters which should be > expected in names as well. I tried to use '\p{L}' to match all Unicode > letters but couldn't get it to work. Feel free understand this as bug > report with an incomplete fix. Maybe use \p{XPosixAlpha} ? but I don't know what version of perl introduced this. > diff --git a/scripts/get_maintainer.pl b/scripts/get_maintainer.pl [] > @@ -442,7 +442,7 @@ sub maintainers_in_file { > my $text = do { local($/) ; <$f> }; > close($f); > > - my @poss_addr = $text =~ m$[A-Za-zÀ-ÿ\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g; > + my @poss_addr = $text =~ m$[A-Za-zÀ-ɏ\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g; my @poss_addr = $text =~ m$[\p{XPosixAlpha}\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g; ? > push(@file_emails, clean_file_emails(@poss_addr)); > } > } > @@ -2460,7 +2460,7 @@ sub clean_file_emails { > $name = ""; > } > > - my @nw = split(/[^A-Za-zÀ-ÿ\'\,\.\+-]/, $name); > + my @nw = split(/[^A-Za-zÀ-ɏ\'\,\.\+-]/, $name); Maybe here too > + my @nw = split(/[^\p{XPosixAlpha}\'\,\.\+-]/, $name); Dunno haven't tested. Maybe you care to test?