Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp2860126pxj; Mon, 10 May 2021 12:23:13 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxNUsNTSg/frGO4/fY9xVjNR1x6KTihTd1a190CiaIm63eKKCs+0STmVgDgT5FlTf69TZ4O X-Received: by 2002:a05:6e02:dca:: with SMTP id l10mr22888143ilj.203.1620674593024; Mon, 10 May 2021 12:23:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1620674593; cv=none; d=google.com; s=arc-20160816; b=JV9pmudny1XV6RKFX9BB0ewflkgTEviKsQIQNyxpHLHbhhlRjVYYh0fL5wxDd4eDTc rw7oBp8fT2KkAQ8j+EVKbTFonVZfO9mOqSzQTH5Uc2gzLF1tachC9qdfJq91Vm8WF16k oHX8LNEKSjE7ZajXKuoPMnv6YX+RCAQPM20W4r9tRnpQiJCAb61pTXbAiEJcFRbEhnsm V8nTAY9bUVyotU7J5f3V3/3EIdLAprAczWWRz1L/p08EK4C4MDwYOrZHdevNUGXXM2kg o5mW26kjBR1hPAhR//x2WzT7oP010GNoGiXA7WBeQxuxIqIP80bq0ENTKEyz+5NBEHPT 7ZBw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=im7yptB7Cnn0+/0nEjxp2q806kDi7qdxtBjeZI/94RE=; b=jps+iAHznOo6/y7Q9XGVErtyXiy+q3RJ7cV7E/J9tmT9mi8Wh7Je+1GJQz6EHeGDIw vidfIYyQH8sFZ0+HYyhbhzriXMbUx1KVgL50mpTJxYazljtmamPnXhHeEh0GzvpDxxkQ T3STDjteyj6pBFEPf3T7Ht4Wio86V/9+baqqEO1mAJhVFRJIFaU+AIGow5XtqEqtTi9w MBT8kUX4UbgrOeIzxnNUNi+zA9HSDQjx2Uk0p8WlHnL9leVn1SYrS9P9nAFpRigXiTXM 6tSTqYF1W8YME7OtalYnj3Ep9dLK7uZApIMKqzuAgdV8PDc1Rq2j+u5wgUXj5W2PXciD UaiA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id d23si15387961jam.89.2021.05.10.12.22.55; Mon, 10 May 2021 12:23:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233236AbhEJTXx (ORCPT + 99 others); Mon, 10 May 2021 15:23:53 -0400 Received: from outgoing-auth-1.mit.edu ([18.9.28.11]:47815 "EHLO outgoing.mit.edu" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S233208AbhEJTXt (ORCPT ); Mon, 10 May 2021 15:23:49 -0400 Received: from cwcc.thunk.org (pool-72-74-133-215.bstnma.fios.verizon.net [72.74.133.215]) (authenticated bits=0) (User authenticated as tytso@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 14AJM3dI013584 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 10 May 2021 15:22:05 -0400 Received: by cwcc.thunk.org (Postfix, from userid 15806) id F267215C3CD9; Mon, 10 May 2021 15:22:02 -0400 (EDT) Date: Mon, 10 May 2021 15:22:02 -0400 From: "Theodore Ts'o" To: David Woodhouse Cc: Mauro Carvalho Chehab , Linux Doc Mailing List , linux-kernel@vger.kernel.org, Jonathan Corbet , alsa-devel@alsa-project.org, coresight@lists.linaro.org, dri-devel@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, intel-wired-lan@lists.osuosl.org, keyrings@vger.kernel.org, kvm@vger.kernel.org, linux-acpi@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-edac@vger.kernel.org, linux-ext4@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net, linux-fpga@vger.kernel.org, linux-hwmon@vger.kernel.org, linux-iio@vger.kernel.org, linux-input@vger.kernel.org, linux-integrity@vger.kernel.org, linux-media@vger.kernel.org, linux-pci@vger.kernel.org, linux-pm@vger.kernel.org, linux-rdma@vger.kernel.org, linux-riscv@lists.infradead.org, linux-sgx@vger.kernel.org, linux-usb@vger.kernel.org, mjpeg-users@lists.sourceforge.net, netdev@vger.kernel.org, rcu@vger.kernel.org, x86@kernel.org Subject: Re: [PATCH 00/53] Get rid of UTF-8 chars that can be mapped as ASCII Message-ID: References: <2ae366fdff4bd5910a2270823e8da70521c859af.camel@infradead.org> <20210510135518.305cc03d@coco.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Mon, May 10, 2021 at 02:49:44PM +0100, David Woodhouse wrote: > On Mon, 2021-05-10 at 13:55 +0200, Mauro Carvalho Chehab wrote: > > This patch series is doing conversion only when using ASCII makes > > more sense than using UTF-8. > > > > See, a number of converted documents ended with weird characters > > like ZERO WIDTH NO-BREAK SPACE (U+FEFF) character. This specific > > character doesn't do any good. > > > > Others use NO-BREAK SPACE (U+A0) instead of 0x20. Harmless, until > > someone tries to use grep[1]. > > Replacing those makes sense. But replacing emdashes — which are a > distinct character that has no direct replacement in ASCII and which > people do *deliberately* use instead of hyphen-minus — does not. I regularly use --- for em-dashes and -- for en-dashes. Markdown will automatically translate 3 ASCII hypens to em-dashes, and 2 ASCII hyphens to en-dashes. It's much, much easier for me to type 2 or 3 hypens into my text editor of choice than trying to enter the UTF-8 characters. If we can make sphinx do this translation, maybe that's the best way of dealing with these two characters? Cheers, - Ted