Received: by 2002:a05:7412:d8a:b0:e2:908c:2ebd with SMTP id b10csp2788909rdg; Mon, 16 Oct 2023 15:18:11 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGMBj9RB+OppfYNlWj/D+Cjle5unwiyRdMZg58z5+WsTW+yqrg0BkGCuCoAN/LOurhHxOP8 X-Received: by 2002:a05:6300:8001:b0:149:9b2f:a79d with SMTP id an1-20020a056300800100b001499b2fa79dmr388772pzc.6.1697494691587; Mon, 16 Oct 2023 15:18:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697494691; cv=none; d=google.com; s=arc-20160816; b=UznTC5aqdH5Ma21srJLTXgN000NCie1IKA5jX6LPmcVjD8ReTzTqCnB2yUJrbOBhW1 JecS1Tn5F3C+BMlMzCK8FHHB77svFj+05bkHdgv4WujXrdtkPO1QadYoheqsT988RhjY snW/bKPm/ieLYM1Q8a84u9hZUXPZKXeWbe75AYgovPnr8kZxzMa1ArlSj2WaMsAzDzY4 r1GuyXTM9b/3behskouvOtk+9Q2S2Wh8E2TCi0453B2N2eJUj7RRluKJyGyYne9L65Js IcijISA624qckx+1RyAWUnzQ+K2fEMwV1ws5bsY77+O6avJhcMqHLqrEZccxW9aR/Qnx 9MfA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent :content-transfer-encoding:references:in-reply-to:date:cc:to:from :subject:message-id; bh=zM1mjy5ZGOqZxZcFUSx7ZOy+8sctJvspGNnLEKuv/2o=; fh=8KG4xrIbO3Y7J1Sb3cafEtr+oyJZmcxsr/S2uPTi9Gw=; b=PrBZmHaPZo30pDKRx/se5GZecpPNIXi0KGYifQ8Qmd3HqtmmgmClMPxrCoygNxwj++ EcuFlgaA76aO8fvjtj1Hq2fcEeuFjkQZ6vEOH2s9fjqRqiSym3FVt2cjxdqHDT2rzGUE 5onTGy2ChaaAXe8wI7Kq/h6ZIOCdIqTuonSlciCCiWujyGVb/pZII3O8oyQoXgYolDb4 s8e3m4wz0VEvMSiQmxY42ShMFVx1VhtIGuL9pWqGFuW5JmpuqbuAxv+Ddjs8XWCCWPRr 67W3Gyf1nvfnNhH3Kz2BreFfv7jK7mVPTDPVWDRPbpaKXuEmErIrt/lBVxjuEhcEZc4b kYzA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from groat.vger.email (groat.vger.email. [2620:137:e000::3:5]) by mx.google.com with ESMTPS id a9-20020a1709027d8900b001c9fb3b55f0si242890plm.652.2023.10.16.15.18.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Oct 2023 15:18:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) client-ip=2620:137:e000::3:5; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 8307B8052BC6; Mon, 16 Oct 2023 15:18:09 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233384AbjJPWSD convert rfc822-to-8bit (ORCPT + 99 others); Mon, 16 Oct 2023 18:18:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35156 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229943AbjJPWSB (ORCPT ); Mon, 16 Oct 2023 18:18:01 -0400 Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 84E6695 for ; Mon, 16 Oct 2023 15:18:00 -0700 (PDT) Received: from omf16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 4494BA0C2D; Mon, 16 Oct 2023 22:17:59 +0000 (UTC) Received: from [HIDDEN] (Authenticated sender: joe@perches.com) by omf16.hostedemail.com (Postfix) with ESMTPA id 3132320018; Mon, 16 Oct 2023 22:17:57 +0000 (UTC) Message-ID: Subject: Re: [PATCH] get_maintainer: correctly parse UTF-8 encoded names in files From: Joe Perches To: Duje =?UTF-8?Q?Mihanovi=C4=87?= , Alvin =?UTF-8?Q?=C5=A0ipraga?= , Linus Torvalds Cc: Konstantin Ryabitsev , linux-kernel@vger.kernel.org, Alvin =?UTF-8?Q?=C5=A0ipraga?= Date: Mon, 16 Oct 2023 15:17:56 -0700 In-Reply-To: <5719647.DvuYhMxLoT@radijator> References: <20231014-get-maintainers-utf8-v1-1-3af8c7aeb239@bang-olufsen.dk> <5719647.DvuYhMxLoT@radijator> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8BIT User-Agent: Evolution 3.48.4 (3.48.4-1.fc38) MIME-Version: 1.0 X-Rspamd-Queue-Id: 3132320018 X-Spam-Status: No, score=-0.7 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY autolearn=unavailable autolearn_force=no version=3.4.6 X-Stat-Signature: ygjxazuck4ktnqqxjjymda1zqr357aq4 X-Rspamd-Server: rspamout08 X-Session-Marker: 6A6F6540706572636865732E636F6D X-Session-ID: U2FsdGVkX1/yGgQOpBeutMkm96IvPb84ajsW0NsW21o= X-HE-Tag: 1697494677-243909 X-HE-Meta: U2FsdGVkX1+2O3X7AO2S+v6HY07pA31HQfMIUD9IdwWlszmqzHxlZM0dNap96jmQQoD67io6SPRPtbXjOWACt+8Qn++rURrNJI2XqOmCKW17gZl6165F3f4k7dJVS505AibNxmQLNn2T27eowLedZu2PipNnxCyyrlUDP1SPe2dVr28u6D3qvuzqNvW76ORA77+HYkn82q4H6jZCezoCwFn9lhjrwFeZOSX5LadTEGtSlDCwgXMNuXvJ8L/DdOUBEGC2rpdQCyGX/EOHk+CUAVwr93Ta8GehjjqyW4mmdbwbEQk3KzyNwEsipbImWMwo X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Mon, 16 Oct 2023 15:18:09 -0700 (PDT) On Mon, 2023-10-16 at 16:37 +0200, Duje Mihanović wrote: > On Saturday, October 14, 2023 7:22:44 PM CEST Alvin Šipraga wrote: > > From: Alvin Šipraga > > > > While the script correctly extracts UTF-8 encoded names from the > > MAINTAINERS file, the regular expressions damage my name when parsing > > from .yaml files. Fix this by replacing the Latin-1-compatible regular > > expressions with the unicode property matcher \p{Latin}. Well, OK > > It's also > > necessary to instruct Perl to open all files with UTF-8 encoding. But I'm not at all sure this is actually desired.