Received: by 2002:a05:6a10:c7d3:0:0:0:0 with SMTP id h19csp862405pxy; Sun, 15 Aug 2021 02:43:37 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwOkLGyb/RL4bSo3yHUhGcKCNmyfoPYcWGELetmznek0pRVXL36XMvv7bUHeRLvOASqr5Ct X-Received: by 2002:a05:6638:25c3:: with SMTP id u3mr3664910jat.52.1629020616830; Sun, 15 Aug 2021 02:43:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1629020616; cv=none; d=google.com; s=arc-20160816; b=GsvfLwteR4SBMAWYZ3QJKA+mibOM8PGdnXe1bwi1HNQNsXkY/q78GnaLDDIKdwMVK+ wrYtHLOAN0GmQaf+5r+w3LNUm3AWRH6nA2xbomCydINbTITf/+lZE7yS2gsfGBm7JEhU c5vr4b2Sat5IrLlT0o75hOgECT1fGyJhM3Ft8F/G4nBqNizGOJVHlxxElmF5BxLi2jnz a79zBzvDsIt2I+K0dkM0/xTVSBZ0VTIYh2cXmJsqNdcNslENKHr+VWQlnm/mnbUEIpka 1oi11gGJerXfXen03AUlrvxeXzDbdgm1Rz0MGJyF1phqlcntUMdwIlfWdV2kKAC+9yoB LDbw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=Cyw/X3qMacB8los9HJhEiYoWaOFiJMVWfBFH5CfSTdo=; b=lUpLg/A6KtGLzxzO6oEYaJOsNbE/+fq0oR+o5IKnkVJ6kmwfA+BmjddmBn+NcCYQXV M81Q++8ash1fFp7yP9F9IBz+x8x8GiqrNLD+EGXZ32b8sLgHM55jM0tMHe1syp04yoAV 9OHsmejMh9eX9/5hVLqoZPqC21hue9lzCNYegmbQucNNlaWW37D+eTOEWOxp6RqFJBzX Y2VkW8Z++DsjmhbsKXa7zvASWzTw2Dg58WwJJcrz86mlKpavpxlqmLAWEVioUxGo+PnW hFhWiGxWGyAgXV58zZlLVMbu7KUcyd8aXljcmfOpfo143zBuyggSIy7o/nSzRTdp6SRz dP5Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=HhBuMmVM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id o3si3936749jah.81.2021.08.15.02.43.23; Sun, 15 Aug 2021 02:43:36 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=HhBuMmVM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237260AbhHOJm6 (ORCPT + 99 others); Sun, 15 Aug 2021 05:42:58 -0400 Received: from mail.kernel.org ([198.145.29.99]:34786 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233413AbhHOJm5 (ORCPT ); Sun, 15 Aug 2021 05:42:57 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id BA0F860295; Sun, 15 Aug 2021 09:42:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1629020547; bh=GSfHvZE/Q+UBpMvO/QW4NWjmTbLIJVvGzR58O4IoWVo=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=HhBuMmVMJ3oXH0dzUi0Fuy5CdwP6wfFidET79td5sbgoRkXMXjjTToVpC+PsCW8x9 Yr6MfwdVkHjQozyWXN1s2T5M89VkdTfrVoKEMKFEmiR0PWHNhlY29Lv58QqLWd5tE8 QZC/iZctxtu5UVFMQ39lb2ttM0+NOsGxEqPWCIh91pdzLrXUXvDux9ShB6C+0SvXwd mBIuhtvlw7EF/9AxhSmVA0QA3mtM+3oR8sAWW9XGiIQ3mZ0vEd9tgTIVcV0Yqn4w4c gQXFz7pXE/ZcrMV5c+/QfD7KzDhJGAtMw55xytvJf4QBX3Ar7vUKlLCRfSe75hvz60 U32qpiamU50zA== Received: by pali.im (Postfix) id C887C98C; Sun, 15 Aug 2021 11:42:24 +0200 (CEST) Date: Sun, 15 Aug 2021 11:42:24 +0200 From: Pali =?utf-8?B?Um9ow6Fy?= To: OGAWA Hirofumi Cc: linux-fsdevel@vger.kernel.org, linux-ntfs-dev@lists.sourceforge.net, linux-cifs@vger.kernel.org, jfs-discussion@lists.sourceforge.net, linux-kernel@vger.kernel.org, Alexander Viro , Jan Kara , "Theodore Y . Ts'o" , Luis de Bethencourt , Salah Triki , Andrew Morton , Dave Kleikamp , Anton Altaparmakov , Pavel Machek , Marek =?utf-8?B?QmVow7pu?= , Christoph Hellwig Subject: Re: [RFC PATCH 01/20] fat: Fix iocharset=utf8 mount option Message-ID: <20210815094224.dswbjywnhvajvzjv@pali> References: <20210808162453.1653-1-pali@kernel.org> <20210808162453.1653-2-pali@kernel.org> <87h7frtlu0.fsf@mail.parknet.co.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <87h7frtlu0.fsf@mail.parknet.co.jp> User-Agent: NeoMutt/20180716 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sunday 15 August 2021 12:42:47 OGAWA Hirofumi wrote: > Pali Rohár writes: > > > Currently iocharset=utf8 mount option is broken and error is printed to > > dmesg when it is used. To use UTF-8 as iocharset, it is required to use > > utf8=1 mount option. > > > > Fix iocharset=utf8 mount option to use be equivalent to the utf8=1 mount > > option and remove printing error from dmesg. > > This change is not equivalent to utf8=1. In the case of utf8=1, vfat > uses iocharset's conversion table and it can handle more than ascii. > > So this patch is incompatible changes, and handles less chars than > utf8=1. So I think this is clean though, but this would be regression > for user of utf8=1. I do not think so... But please correct me, as this code around is mess. Without this change when utf8=1 is set then iocharset= encoding is used for case-insensitivity implementation (toupper / tolower conversion). For all other parts are use correct utf8* conversion functions. But you use touppper / tolower functions from iocharset= encoding on stream of utf8 bytes then you either get identity or some unpredictable garbage in utf8. So when comparing two (different) non-ASCII filenames via this method you in most cases get that filenames are different. Because converting their utf8 bytes via toupper / tolower functions from iocharset= encoding results in two different byte sequences in most cases. Even for two utf8 case-insensitive same strings. But you can play with it and I guess it is possible to find two different utf8 strings which after toupper / tolower conversion from some iocharset= encoding would lead to same byte sequence. This patch uses for utf8 tolower / touppser function simple 7-bit tolower / toupper ascii function. And so for 7-bit ascii file names there is no change. So this patch changes behavior when comparing non 7-bit ascii file names, but only in cases when previously two different file names were marked as same. As now they are marked correctly as different. So this is changed behavior, but I guess it is bug fix which is needed. If you want I can put this change into separate patch. Issue that two case-insensitive same files are marked as different is not changed by this patch and therefore this issue stay here. > Thanks. > -- > OGAWA Hirofumi