Received: by 2002:a05:7412:8598:b0:f9:33c2:5753 with SMTP id n24csp215787rdh; Mon, 18 Dec 2023 17:25:59 -0800 (PST) X-Google-Smtp-Source: AGHT+IH8q9vQ5WY55oaze4qXbi7smkpQbvtrCnlQ1gC5wuVqHD4b7Dr7Rvezk2xz3XsNwAALSzM3 X-Received: by 2002:a17:906:5:b0:a23:f2e:69b7 with SMTP id 5-20020a170906000500b00a230f2e69b7mr1877840eja.57.1702949159047; Mon, 18 Dec 2023 17:25:59 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702949159; cv=none; d=google.com; s=arc-20160816; b=oXyyJpkL7x4dgivh5qtLqwyHjX5X8nZVT88GqnJ5sbYNl52fdHbsBGygREsh15Z/qJ yfA/rnHJrIYvCllBWLti5oTE/anO93SdRlbhZw6VxNMRLP4aQyDuRJDVKy7b/+0VkzEN xyDyxiwfSMe344z9A6Jpah+i7EgFnqG8RZVYzwgdV4jD1qrS2itQ5R6WyATi6DbgcaTX wW+1OdIf60tfhaezheysvfMZeMCht9roIspyNjqjr9cL23sIliTzui5hphDIBsPGy834 g/NDcxHc7Ca34GhLjhYUh5dn07UI2QunR7k+A9T6TQBCWlFP53Z65bLfxyhMbpP6yzh+ Xrow== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:subject:from :dkim-signature; bh=axGj3enhg+YeIx3Crw8OdM616NaGhSm4kX8xiNtcf6E=; fh=xl2J40ly2ZcWHQktF4kxcRfS4CvpZ1lyNrelw9LazNI=; b=ebry8FMbzohz0Hbzcu2EqA1ljAP3jeBoEgOHV8vxAhArF//LFdHk0D0dv+gYj2lOG9 AofL0WjqkN2PK6wTUojkqLnPobtLUHPQu9xq4OshfITQ2Ly5yP659Hi+ibCis8SmKoOt MZGcQ27ivVXcA0VxQamiSt5rwxNtYSBznHu2FA23NuEIa/RuJwOaRlt7NxU9fyD7BA0m 3R9Kn+eT8obnGyegwKgLd/mKbP2KQhei/3iLcR9t2Zaa3Vh32hiq/hLCMQaHgx8ijm7j roS23TLvJUiaTSMvZ3A72bOtxMHU0JcuCQFV2betisu5mgjFT4D7oQGXu5pU/pGwFvC5 2pXQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@pqrs.dk header.s=key1 header.b=088xH3k1; spf=pass (google.com: domain of linux-kernel+bounces-4554-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-4554-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id a17-20020a170906245100b00a2362d2fcc4si892084ejb.841.2023.12.18.17.25.58 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Dec 2023 17:25:59 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-4554-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@pqrs.dk header.s=key1 header.b=088xH3k1; spf=pass (google.com: domain of linux-kernel+bounces-4554-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-4554-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 6F8421F218C9 for ; Tue, 19 Dec 2023 01:25:58 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id E79B71FDD; Tue, 19 Dec 2023 01:25:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=pqrs.dk header.i=@pqrs.dk header.b="088xH3k1" X-Original-To: linux-kernel@vger.kernel.org Received: from out-170.mta0.migadu.com (out-170.mta0.migadu.com [91.218.175.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B4D5A15A0 for ; Tue, 19 Dec 2023 01:25:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=pqrs.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=pqrs.dk X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pqrs.dk; s=key1; t=1702949136; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=axGj3enhg+YeIx3Crw8OdM616NaGhSm4kX8xiNtcf6E=; b=088xH3k1u1oMAfJp6Qzyyme9hxtW4LAb6VgTYE6/DlYaPNA6ByIGXZzvjCQ6txJ+1s/6lg bz6eotxsBMBBqT6L1KulzV+A65MEkmvo7AbR4iCroM1MjYfqRmMJUxgruvZjKpM91EV6mt aCWQcvHYC2cDceqYwk6JXtRnUNT2OV39PSQKRdlozIFZGTA/I/MIl26kXKWwl3AIykmhhB 3cC4rjPjp+a4joHkuqxLzbIRXlY8wwdi9PXLN0gZXxr6OkgKV7R14W0LA0HCJEaEG/b5yc T1Sc7QaDebf5l7mpJuVAN0skjHgKvZ0op6I17WOFQxR287KlcURCxGQ+n9xxHQ== From: =?utf-8?q?Alvin_=C5=A0ipraga?= Subject: [PATCH v3 0/2] get_maintainer: correctly parse UTF-8 encoded names in files Date: Tue, 19 Dec 2023 02:25:13 +0100 Message-Id: <20231219-get-maintainers-utf8-v3-0-f85a39e2265a@bang-olufsen.dk> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit X-B4-Tracking: v=1; b=H4sIAPnwgGUC/4XOsQ7CIBAG4FcxzGLKgS06+R7GgcLREpUaoETT9 N2lXRyMcbjhv+T/7iYSMTiM5LiZSMDsoht8CXy7IbpXvkPqTMkEKuCsYoJ2mOhdOZ/KYIh0TFZ SDrrea2FqKxUp1UdA654re76U3LuYhvBar2S2bP+AmVFGubJSNwpb4IdTW56hw220Ef3OXMnCZ vhQ8JOCQrVMSqObSoAS39Q8z28FIQBrCQEAAA== To: Joe Perches , Linus Torvalds , Andrew Morton Cc: =?utf-8?q?Duje_Mihanovi=C4=87?= , Konstantin Ryabitsev , linux-kernel@vger.kernel.org, =?utf-8?q?Alvin_=C5=A0ipraga?= X-Migadu-Flow: FLOW_OUT Signed-off-by: Alvin Šipraga --- Changes in v3: - add more rationale for opening everything with UTF-8 encoding - fix a separate issue identified when introducing UTF-8 names, namely that they would not get escaped with quotes as expected, due to Perl's default behaviour being to match UTF-8 characters with \w - add a second patch to fix an unrelated issue mentioned by Joe whereby a mailing list might get the display name '-' - Link to v2: https://lore.kernel.org/r/20231214-get-maintainers-utf8-v2-1-b188dc7042a4@bang-olufsen.dk Changes in v2: - use '\p{L}' rather than '\p{Latin}', so that matching is even more inclusive (i.e. match also Greek letters, CJK, etc.) - fix commit message to refer to tools mailing list, not b4 mailing list - Link to v1: https://lore.kernel.org/r/20231014-get-maintainers-utf8-v1-1-3af8c7aeb239@bang-olufsen.dk --- Alvin Šipraga (2): get_maintainer: correctly parse UTF-8 encoded names in files get_maintainer: remove stray punctuation when cleaning file emails scripts/get_maintainer.pl | 48 +++++++++++++++++++++++++++-------------------- 1 file changed, 28 insertions(+), 20 deletions(-) --- base-commit: 2cf4f94d8e8646803f8fb0facf134b0cd7fb691a change-id: 20231014-get-maintainers-utf8-32c65c4d6f8a