Received: by 2002:a05:6358:1087:b0:cb:c9d3:cd90 with SMTP id j7csp3761339rwi; Fri, 21 Oct 2022 23:07:48 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7A3fglcTLNx7mjOasE2YsumBgS1XG1HfgM7FS4dZxLXzckMI4M8c3P0eRV1KiptSFRjVar X-Received: by 2002:a17:907:2712:b0:78d:a223:729b with SMTP id w18-20020a170907271200b0078da223729bmr18858320ejk.443.1666418867920; Fri, 21 Oct 2022 23:07:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666418867; cv=none; d=google.com; s=arc-20160816; b=ug5nXpHM9WIOkKa+Fn6SY0FMo0exrDWF8irAyoQacGhw4rJxL/6aPi3Rg7dvGuPwGV ei99sUqYYeMeDty6HDpdziCQY4gegOLiY7D7MyXoyG6l/yR9tYmoZnE3KBifBbXJvZvs OQgtFtOklWAsuCNSR75/wu5O/CDHn5UqLJ7eaZnBGx5OWIMQmiJdw1ydmX65gkE858Jr 2qtUWXx6sUeBicHT7w/mYUBRP5xPiudwUbu3HpR99sGTqW5WeJuhU/nkXriMsfeveyQi wWN5bJmBM5S2eqs3bcCeMQhOcoTkqmSHVqfn9u3Xix5YoyA6l62EEYb3punoJk/GVS4Y Un/A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent :content-transfer-encoding:references:in-reply-to:date:cc:to:from :subject:message-id; bh=G+pqhRs/+xUaypppTREsmg9zfKnYN0nTwkaDA1SiluU=; b=CYg4QEXR6oPgUPzZItKlixqR2shRHQkxffrddIaMdNXK+iKzBCGT/VQk2UVLjGi2xe alAqNt2sHi10wZ06dlgDJZoMuL8MsH9fu+Y03qzush9/VsMuNC6wU6pnFTHHkBbkgu/I IAgYxYxXoI5biWrs6NX3WfNToCAa+IXYgT4uO/I2/VIsVMTEujxG8nQxvk7H7VRbM9XN yYp7SfhPsVHRTkKl+6hQmeCjJlbW/3pRpe6DeGIrcFbwAFBIfOXDcgvo3uqiDlzbRRHY 8I3nwKblOFZHrnxUHAsS96t0uzhDn5y/oGZW+bEPAZRP1UEQ+h7S8HDVZDyrSxVBtWK3 eXAg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id go12-20020a1709070d8c00b0078e319dcca7si20449884ejc.744.2022.10.21.23.07.18; Fri, 21 Oct 2022 23:07:47 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229959AbiJVFse convert rfc822-to-8bit (ORCPT + 99 others); Sat, 22 Oct 2022 01:48:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56404 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229925AbiJVFsa (ORCPT ); Sat, 22 Oct 2022 01:48:30 -0400 Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 306FA2B3218 for ; Fri, 21 Oct 2022 22:48:28 -0700 (PDT) Received: from omf11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id AA2CF1C60A7; Sat, 22 Oct 2022 05:48:27 +0000 (UTC) Received: from [HIDDEN] (Authenticated sender: joe@perches.com) by omf11.hostedemail.com (Postfix) with ESMTPA id DFC2A20029; Sat, 22 Oct 2022 05:48:09 +0000 (UTC) Message-ID: Subject: Re: [PATCH] checkpatch: handle utf8 while computing length of commit msg lines From: Joe Perches To: Antonio Borneo , Andy Whitcroft , Dwaipayan Ray , Lukas Bulwahn , linux-kernel@vger.kernel.org Cc: Andrew Morton , Linus Torvalds Date: Fri, 21 Oct 2022 22:48:20 -0700 In-Reply-To: <20221021191507.9026-1-antonio.borneo@foss.st.com> References: <20221021191507.9026-1-antonio.borneo@foss.st.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 8BIT User-Agent: Evolution 3.44.4 (3.44.4-2.fc36) MIME-Version: 1.0 X-Rspamd-Server: rspamout07 X-Rspamd-Queue-Id: DFC2A20029 X-Spam-Status: No, score=-0.9 required=5.0 tests=BAYES_00,FORGED_SPF_HELO, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_PASS, SPF_NONE,UNPARSEABLE_RELAY autolearn=no autolearn_force=no version=3.4.6 X-Stat-Signature: c7g8xs5eprnqr3fpe5c9cixhcuw5hg4x X-Session-Marker: 6A6F6540706572636865732E636F6D X-Session-ID: U2FsdGVkX193QcIKrXNbTNcN/8NsqjepZUV7md3dQqo= X-HE-Tag: 1666417689-726588 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 2022-10-21 at 21:15 +0200, Antonio Borneo wrote: > The current check for the length of each line in the commit msg > uses length($line) that counts line's bytes. > If the line contains utf8 characters, the byte count can exceed > the cap even on quite short lines. > > Count the utf8 characters for checking line length. > > Signed-off-by: Antonio Borneo > > --- > > Actually it's not fully clear to me if utf8 characters in the > commit msg are acceptable/tolerated or to be avoided. Nor is it to me, likely it's OK though as at least checkpatch has an existing test/comment for nominally valid UTF-8 in commit messages. CHK("INVALID_UTF8", "Invalid UTF-8, patch and commit message should be encoded in UTF-8\n" . $hereptr); > In the commit msg of 15662b3e8644 ("checkpatch: add a --strict > check for utf-8 in commit logs") is stated: > Some find using utf-8 in commit logs inappropriate. I don't particularly care one way or another. Andrew? Linus? > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl > index 1e5e66ae5a52..eaad5da50554 100755 > --- a/scripts/checkpatch.pl > +++ b/scripts/checkpatch.pl > @@ -3220,7 +3220,7 @@ sub process { > > # Check for line lengths > 75 in commit log, warn once > if ($in_commit_log && !$commit_log_long_line && > - length($line) > 75 && > + length(decode("utf8", $line)) > 75 && > !($line =~ /^\s*[a-zA-Z0-9_\/\.]+\s+\|\s+\d+/ || > # file delta changes > $line =~ /^\s*(?:[\w\.\-\+]*\/)++[\w\.\-\+]+:/ || > > base-commit: 9abf2313adc1ca1b6180c508c25f22f9395cc780