Received: by 2002:a05:6a10:c7d3:0:0:0:0 with SMTP id h19csp915138pxy; Sun, 15 Aug 2021 04:25:08 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxxY33IEe+xOVxlJeGAbcDtGQAKK30CIqlj7IIG7xWkplk2Oc//paJ3VVOSO4/uRw20G0OI X-Received: by 2002:a17:906:d057:: with SMTP id bo23mr11218414ejb.208.1629026708058; Sun, 15 Aug 2021 04:25:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1629026708; cv=none; d=google.com; s=arc-20160816; b=F4Kw/j6xUYZs2k5aTuYSiYw2GDX5+0wKvusMMjwh/yviePkr4WEBhlaPFS42Hc5f3C 18Vs61RLFzXAqZLrBjST2DntmYH+NVAZadFYXVPN+suY2geIToNdKg8mSZXOwqyUgJYa PH49QtF6ta43kjv6H19SkGKVf3jqZGL8u7+FRizEPHwzSfh/w8cxeKiVI8MYNr7u/w39 FUL9zQmh7itK1/qY3slQxecL7eDOcx08i3xMoBkumTVYSrMJJD7LvwpjA/G9Sq/KnsMZ csLTt8wDfs8fC8zWsIlzA39WrtDhbmWOtU2WMeJX892Zt0/s7g3JOYSpaXstHskqnGDA EsTA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:message-id:in-reply-to:date:references:subject:cc:to :from; bh=OnP11bPDlpAEVsIIjq6VI522If5hydJfUQBX4AyP084=; b=dl3w4CiTBIgARv9fQNNhKiMhXj434TSGL2+J7GxGcXWaRYUL8594Qr1s0ULBVySgw+ JAkjHyvOTG74QaEHvhq/8867XYewnGKNXdHedY7V3teVbchDIpjCh2Eoe5Xd5U4Om+re nqPZfgeaZ4y128lLlrzFCQztYeDAla/icWWicbElcv19n+O/sY0T4IdrzG1xtkxWbqxy vqMfiDwwFT5P8FqiIh3jOS37YHwVJn5zBVIzuEH+HzpZ6Qn73sHt2kOFK5d7D4+/NUEd xeMqpYYG66V6Q2bZNjPvPzvGLhnpTZ+ZA76dld+j5VZ3JNHt1wU9oiyaX2i4gcc9H74Z 6dyA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id u17si7510508eda.469.2021.08.15.04.24.38; Sun, 15 Aug 2021 04:25:08 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237382AbhHOLXp (ORCPT + 99 others); Sun, 15 Aug 2021 07:23:45 -0400 Received: from mail.parknet.co.jp ([210.171.160.6]:32808 "EHLO mail.parknet.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229597AbhHOLXo (ORCPT ); Sun, 15 Aug 2021 07:23:44 -0400 Received: from ibmpc.myhome.or.jp (server.parknet.ne.jp [210.171.168.39]) by mail.parknet.co.jp (Postfix) with ESMTPSA id D641815F93A; Sun, 15 Aug 2021 20:23:09 +0900 (JST) Received: from devron.myhome.or.jp (foobar@devron.myhome.or.jp [192.168.0.3]) by ibmpc.myhome.or.jp (8.15.2/8.15.2/Debian-22) with ESMTPS id 17FBN8lj265086 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Sun, 15 Aug 2021 20:23:09 +0900 Received: from devron.myhome.or.jp (foobar@localhost [127.0.0.1]) by devron.myhome.or.jp (8.15.2/8.15.2/Debian-22) with ESMTPS id 17FBN8Jk1660981 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Sun, 15 Aug 2021 20:23:08 +0900 Received: (from hirofumi@localhost) by devron.myhome.or.jp (8.15.2/8.15.2/Submit) id 17FBN6An1660978; Sun, 15 Aug 2021 20:23:06 +0900 From: OGAWA Hirofumi To: Pali =?iso-8859-1?Q?Roh=E1r?= Cc: linux-fsdevel@vger.kernel.org, linux-ntfs-dev@lists.sourceforge.net, linux-cifs@vger.kernel.org, jfs-discussion@lists.sourceforge.net, linux-kernel@vger.kernel.org, Alexander Viro , Jan Kara , "Theodore Y . Ts'o" , Luis de Bethencourt , Salah Triki , Andrew Morton , Dave Kleikamp , Anton Altaparmakov , Pavel Machek , Marek =?iso-8859-1?Q?Beh=FAn?= , Christoph Hellwig Subject: Re: [RFC PATCH 01/20] fat: Fix iocharset=utf8 mount option References: <20210808162453.1653-1-pali@kernel.org> <20210808162453.1653-2-pali@kernel.org> <87h7frtlu0.fsf@mail.parknet.co.jp> <20210815094224.dswbjywnhvajvzjv@pali> Date: Sun, 15 Aug 2021 20:23:06 +0900 In-Reply-To: <20210815094224.dswbjywnhvajvzjv@pali> ("Pali =?iso-8859-1?Q?Roh=E1r=22's?= message of "Sun, 15 Aug 2021 11:42:24 +0200") Message-ID: <871r6vt0it.fsf@mail.parknet.co.jp> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org To: Pali Roh?r Cc: linux-fsdevel@vger.kernel.org, linux-ntfs-dev@lists.sourceforge.net, linux-cifs@vger.kernel.org, jfs-discussion@lists.sourceforge.net, linux-kernel@vger.kernel.org, Alexander Viro , Jan Kara , "Theodore Y . Ts'o" , Luis de Bethencourt , Salah Triki , Andrew Morton , Dave Kleikamp , Anton Altaparmakov , Pavel Machek , Marek Beh?n , Christoph Hellwig Subject: Re: [RFC PATCH 01/20] fat: Fix iocharset=utf8 mount option From: OGAWA Hirofumi Gcc: nnimap+ibmpc.myhome.or.jp:Sent --text follows this line-- To: Pali Roh?r Cc: linux-fsdevel@vger.kernel.org, linux-ntfs-dev@lists.sourceforge.net, linux-cifs@vger.kernel.org, jfs-discussion@lists.sourceforge.net, linux-kernel@vger.kernel.org, Alexander Viro , Jan Kara , "Theodore Y . Ts'o" , Luis de Bethencourt , Salah Triki , Andrew Morton , Dave Kleikamp , Anton Altaparmakov , Pavel Machek , Marek Beh?n , Christoph Hellwig Subject: Re: [RFC PATCH 01/20] fat: Fix iocharset=utf8 mount option From: OGAWA Hirofumi Gcc: nnimap+ibmpc.myhome.or.jp:Sent --text follows this line-- Pali Roh?r writes: >> This change is not equivalent to utf8=1. In the case of utf8=1, vfat >> uses iocharset's conversion table and it can handle more than ascii. >> >> So this patch is incompatible changes, and handles less chars than >> utf8=1. So I think this is clean though, but this would be regression >> for user of utf8=1. > > I do not think so... But please correct me, as this code around is mess. > > Without this change when utf8=1 is set then iocharset= encoding is used > for case-insensitivity implementation (toupper / tolower conversion). > For all other parts are use correct utf8* conversion functions. > > But you use touppper / tolower functions from iocharset= encoding on > stream of utf8 bytes then you either get identity or some unpredictable > garbage in utf8. So when comparing two (different) non-ASCII filenames > via this method you in most cases get that filenames are different. > Because converting their utf8 bytes via toupper / tolower functions from > iocharset= encoding results in two different byte sequences in most > cases. Even for two utf8 case-insensitive same strings. > > But you can play with it and I guess it is possible to find two > different utf8 strings which after toupper / tolower conversion from > some iocharset= encoding would lead to same byte sequence. > > This patch uses for utf8 tolower / touppser function simple 7-bit > tolower / toupper ascii function. And so for 7-bit ascii file names > there is no change. > > So this patch changes behavior when comparing non 7-bit ascii file > names, but only in cases when previously two different file names were > marked as same. As now they are marked correctly as different. So this > is changed behavior, but I guess it is bug fix which is needed. > If you want I can put this change into separate patch. > > Issue that two case-insensitive same files are marked as different is > not changed by this patch and therefore this issue stay here. OK, sure. utf8 looks like broken than I was thinking (although user can use iocharset=ascii and utf8=1 for this). The code might be better to clean up a bit more though, looks like good basically. One thing, please update FAT_DEFAULT_IOCHARSET help in Kconfig and Documentation/filesystems/vfat.rst (with new warning about iocharset=utf8). Thanks. -- OGAWA Hirofumi