Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp6875305rdb; Fri, 15 Dec 2023 10:31:20 -0800 (PST) X-Google-Smtp-Source: AGHT+IG/SZsTPeGiCVM8SynTk/XeBIxcF4PgG92vei17ibO30zZLzyk1NelkFi1RjWOoF0jDl9YY X-Received: by 2002:a17:906:2258:b0:a23:2045:69f2 with SMTP id 24-20020a170906225800b00a23204569f2mr671620ejr.42.1702665080097; Fri, 15 Dec 2023 10:31:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702665080; cv=none; d=google.com; s=arc-20160816; b=ieR3pklaHJ8zd97ZPPFJecxQXEG7Zyr32WdoLv2YXtUCTiKVvoG4C3wzAhT2ZxJVjK DFwtFCugPjIi3zX8W2yFBJ0mnBhfd52OMX7lJbqe0NWBCsgcuYVtijWsgpBlvTErG7IG apQxnnMJR2w3rC0RUvRIUh98nBThYasd28y4/mU+7LCUyWql7PsLe4mJzvPUYtDN+RvF lw/Dtg7ZqYFk4ANpIlfhrayywUTNSmV0e1r/3hi1HNDzJd208gSRmf2ZLKmzDUa9QFpF oTCjlXPK/gWQEZ7rz1O/jDfhjE0oGuHGMqF3Bna4uCBo0KhIzvVgoYJhGREjqSKKGGBl x+LQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:list-unsubscribe:list-subscribe:list-id:precedence :user-agent:content-transfer-encoding:references:in-reply-to:date:cc :to:from:subject:message-id; bh=99K5e18fKgBTgw5rW3XJl+Xf7i+0xtPXMMiG+sxsCcM=; fh=L/YqfeDpjLIBxPt2ROhpbUetBNZOQ1EO2rlAls5H8LQ=; b=peuOkwXWzdx72NFOuNc95Y4FmY841b0NkSlf6RZH3/kU6yXpsMa7BUC3gFbA4aQ2QL XGg+blg26b31gfS4tOFshuQ6q6z+w5Pfyqlnezo8JfWh0Zhke6vmuJgMvxmdgMavGBPY JfuNpvRTvC2+FiCYPBQfOWBRMlWcAABPaX9ABrEmQT8gidACAWahNugcFJvWfwVEyv6x kYM6agqUtwuF43a8MNBqpTgWGlV9er3IXZM8SWwYb5NcIze7Uygygxntth4+suru/fZF 9Z5NTIkKwRv0xefsW/Xo6ubhEXMMC90Bf5nLkojt/ne7opxdmIU1Ksuc4zrNtCsZlqMw zQcw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-1533-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-1533-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id gs19-20020a170906f19300b00a1b15488f0csi7225446ejb.441.2023.12.15.10.31.20 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 15 Dec 2023 10:31:20 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-1533-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-1533-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-1533-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 0A2851F239E8 for ; Fri, 15 Dec 2023 18:31:11 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id D20243EA6C; Fri, 15 Dec 2023 18:31:01 +0000 (UTC) X-Original-To: linux-kernel@vger.kernel.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 88E8334CF6 for ; Fri, 15 Dec 2023 18:30:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=perches.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=perches.com Received: from omf01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id CCCC3120533; Fri, 15 Dec 2023 18:30:57 +0000 (UTC) Received: from [HIDDEN] (Authenticated sender: joe@perches.com) by omf01.hostedemail.com (Postfix) with ESMTPA id 4C7F06000F; Fri, 15 Dec 2023 18:30:53 +0000 (UTC) Message-ID: <0173e76a36b3a9b4e7f324dd3a36fd4a9757f302.camel@perches.com> Subject: Re: [PATCH v2] get_maintainer: correctly parse UTF-8 encoded names in files From: Joe Perches To: Alvin =?UTF-8?Q?=C5=A0ipraga?= Cc: Alvin =?UTF-8?Q?=C5=A0ipraga?= , Linus Torvalds , Duje =?UTF-8?Q?Mihanovi=C4=87?= , Konstantin Ryabitsev , "linux-kernel@vger.kernel.org" , Shawn Guo , Andrew Morton Date: Fri, 15 Dec 2023 10:30:52 -0800 In-Reply-To: <45x65lwhzefxfe7muha6myfqb53ooxvhjpgeqadeiikl5nriws@ekwlxybd6ybp> References: <20231214-get-maintainers-utf8-v2-1-b188dc7042a4@bang-olufsen.dk> <45x65lwhzefxfe7muha6myfqb53ooxvhjpgeqadeiikl5nriws@ekwlxybd6ybp> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.48.4 (3.48.4-1.fc38) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Rspamd-Queue-Id: 4C7F06000F X-Rspamd-Server: rspamout06 X-Stat-Signature: undfxqwx3mfedzwfx71pedjzrm7nkyp9 X-Session-Marker: 6A6F6540706572636865732E636F6D X-Session-ID: U2FsdGVkX1/IryiKmxkpKMkUA2YFpz6Jsvdymem2loQ= X-HE-Tag: 1702665053-690716 X-HE-Meta: U2FsdGVkX1/CIY4lCzZGArFJsUbPo3LYtGZZTKeZnmJXswkREBARJhXkoqTXXtUIt4SSbAaNzUxK+yXT2U8Q6MXx6ukSMp/i8Exc3MCx2xH8+NnRI14wPhMRXmIduKaM0JEp4Ae3ycn6lblmOw7/VKNjTg+nY77v7aeB7oiJV4VkIoh7JC5MmpzCdpxeRNU+KcJdgkY8GBQwpwUgDNJnFi3mBg+Z7Q24yGlKhUYhp+MeBKDHgNQSrcDg43SZjcf4sOe2FfbkcQ2gqkIiVd4gshxREwK1a9foW9xEq0unStfhmMeUoGg2BefeTbFTutFGsJRXGPK6h85IMtcVM6vstaUqHGzSIJ6CudP9xWgflVwspXGKxA8Z0/5p+XkcDmR6bQbbuZU6rWqUMxUyTOhjqQExdrgVeVajVlj7wShnz8QOT43NFnxO8/KJ4lldAF0Q7b+CSn9d21ouV4LTw8cybauoB/UDAAZOsutMBznVcdcquHAODUSDUm+i0AXhdnkL3QMHQx2VPKALfJHWHNVpaUJLkXA5aeKkujItjkS96ssK0LbmKo3VgA== On Fri, 2023-12-15 at 10:30 +0000, Alvin =C5=A0ipraga wrote: > On Thu, Dec 14, 2023 at 07:57:54AM -0800, Joe Perches wrote: > > On Thu, 2023-12-14 at 16:06 +0100, Alvin =C5=A0ipraga wrote: > > > @@ -442,7 +443,7 @@ sub maintainers_in_file { > > > my $text =3D do { local($/) ; <$f> }; > > > close($f); > > > =20 > > > - my @poss_addr =3D $text =3D~ m$[A-Za-z=C3=80-=C3=BF\"\' \,\.\+-]*\s= *[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\= >\}]{0,1}$g; > > > + my @poss_addr =3D $text =3D~ m$[\p{L}\"\' \,\.\+-]*\s*[\,]*\s*[\(\<= \{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g; > > > push(@file_emails, clean_file_emails(@poss_addr)); Hi again Alvin. Separate issue, but on the one .yaml file I tried: $ ./scripts/get_maintainer.pl Documentation/devicetree/bindings/serial/8250= .yaml Greg Kroah-Hartman (supporter:TTY LAYER AND SE= RIAL DRIVERS) Jiri Slaby (supporter:TTY LAYER AND SERIAL DRIVERS) Rob Herring (maintainer:OPEN FIRMWARE AND FLATTENED DE= VICE TREE BINDINGS) Krzysztof Kozlowski (maintainer:OPEN FI= RMWARE AND FLATTENED DEVICE TREE BINDINGS) Conor Dooley (maintainer:OPEN FIRMWARE AND FLATTENED = DEVICE TREE BINDINGS) Lubomir Rintel (in file) - (in file) linux-kernel@vger.kernel.org (open list:TTY LAYER AND SERIAL DRIVERS) linux-serial@vger.kernel.org (open list:TTY LAYER AND SERIAL DRIVERS) devicetree@vger.kernel.org (open list:OPEN FIRMWARE AND FLATTENED DEVICE TR= EE BINDINGS) Note the single '-' in the "name" portion of devicetree@vger.kernel.org Maybe clean_file_emails needs some better name cleansing code. > > Rather than open _all_ files in utf-8, perhaps the block > > that opens a specific file to find maintainers > >=20 > > sub maintainers_in_file { > > my ($file) =3D @_; > >=20 > > return if ($file =3D~ m@\bMAINTAINERS$@); > >=20 > > if (-f $file && ($email_file_emails || $file =3D~ /\.yaml$/)) { > > open(my $f, '<', $file) > > or die "$P: Can't open $file: $!\n"; > > my $text =3D do { local($/) ; <$f> }; > > close($f); > > ... > >=20 > > should change the > >=20 > > open(my $f... > > to > > use open qw(:std :encoding(UTF-8)); > > open(my $f... >=20 > Yes, this also works for parsing the name in an arbitrary file. But with = the > change you suggest above, the script then corrupts my name when it is lif= ted > from MAINTAINERS (!?): >=20 > $ ./scripts/get_maintainer.pl -f drivers/net/dsa/realtek/ | grep alsi > "Alvin =C3=85=C2=A0ipraga" (maintainer:REALTEK RTL= 83xx SMI DSA ROUTER CHIPS) Curious. Let me see if I can figure out why that happens. > If you are still unconvinced then I will gladly send a v3 patching the tw= o cases > we have discussed (read_maintainer_file() and maintainers_in_file()). No rush. > > And unrelated and secondarily, perhaps the > > $file =3D~ /\.yaml$/ > > test should be > > $file =3D~ /\.(?:yaml|dtsi?)$/ > >=20 > > to also find any maintainer address in the dts* files > >=20 > > https://lore.kernel.org/lkml/20231028174656.GA3310672@bill-the-cat/T/ >=20 > Is this supposed to parse the "Copyright (c) 20xx John Doe = " in > the .dts* files? Yes, just as it would and does for .yaml files. $ git grep -P -i 'copy.*\<\w+\@\w+\.\w+\>' -- '*.yaml' Documentation/devicetree/bindings/display/bridge/chrontel,ch7033.yaml:# Cop= yright (C) 2019,2020 Lubomir Rintel Documentation/devicetree/bindings/media/marvell,mmp2-ccic.yaml:# Copyright = 2019,2020 Lubomir Rintel Documentation/devicetree/bindings/misc/olpc,xo1.75-ec.yaml:# Copyright (C) = 2019,2020 Lubomir Rintel Documentation/devicetree/bindings/phy/allwinner,sun50i-h6-usb3-phy.yaml:# C= opyright 2019 Ondrej Jirman Documentation/devicetree/bindings/phy/marvell,mmp3-hsic-phy.yaml:# Copyrigh= t 2019 Lubomir Rintel Documentation/devicetree/bindings/phy/marvell,mmp3-usb-phy.yaml:# Copyright= 2019,2020 Lubomir Rintel Documentation/devicetree/bindings/reset/bitmain,bm1880-reset.yaml:# Copyrig= ht 2019 Manivannan Sadhasivam Documentation/devicetree/bindings/reset/marvell,berlin2-reset.yaml:# Copyri= ght 2015 Antoine Tenart Documentation/devicetree/bindings/reset/qca,ar7100-reset.yaml:# Copyright 2= 015 Alban Bedel Documentation/devicetree/bindings/serial/8250.yaml:# Copyright 2020 Lubomir= Rintel Documentation/devicetree/bindings/spi/marvell,mmp2-ssp.yaml:# Copyright 201= 9,2020 Lubomir Rintel Documentation/devicetree/bindings/usb/marvell,pxau2o-ehci.yaml:# Copyright = 2019,2020 Lubomir Rintel > But sure, I can do a resend of Shawn's original patch > separately if you like. Yes please. Make sure to cc Andrew Morton.