Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp1945896ybl; Sun, 19 Jan 2020 15:11:17 -0800 (PST) X-Google-Smtp-Source: APXvYqzeLjrN2jGZt1AJ4X5hJ0a2SaHG8GqSPaRYcrb1sJdJ861xwWKGG3k+tkuNvmRdkIj4CFYS X-Received: by 2002:aca:ec93:: with SMTP id k141mr10403497oih.145.1579475477214; Sun, 19 Jan 2020 15:11:17 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1579475477; cv=none; d=google.com; s=arc-20160816; b=0NlKzG+DGbB3I+EbO1qADY2XhC7/UMCaARlpMUwDuR5f+2zoqqxIWZHOu0BkzTmNSE f+Y0m6yD94BD6re4G9qrrj6zHLWF6jJG055R7AkEeWYBeofqHl/lwRQZ07fehVsIHQw2 szb4bOwtZeCRd7Bj1fq9OK9zGL7F3MNMYXsUhzCFWOlX77M8DscGqhumEcsTHywegND5 VHHn/TyEhcVDepl/9KH8QDRvCFB7BqO7LSKfFi3AqwjdH3OG/6Vq7PHJi7FWSkZ2qOaN ZupRsmzHbZgL+J7QmJmfIl/CU3ZA29oEsTmBOvWq/BVlTfu+8w8yXavWsjrNN0aW0fOr 8quw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=g8GIUh0geT0KNLhiYHzPbZcolalUNtSm+wd1SwBhREg=; b=EOFQr/71Xy6MkbuMP1LQ0K0rwH7nV9IeGANYqJ+Dq5Fhgr9s3+AEafsanJ2ra9O74W HY3bv41HI9556/ahwz5dbJA7USFoWIaEF9vmUWhN86Cg60qlfOOAysQP45NNv48O4+ZG +qX2eVjSsiz3lWSrcZUub00mTqyFslriZR7oFd+KhGcUQZHPN+rmxIP4ruW4daCzxCnR nIKFhSZEgxMoSYnJE1iTQaHcfre/MxnnqfygT2JsfMCZGfUeWfmO1Al3xPPaJhG/v14Z SzkE0wqymiS/Hu08n5XyygReLx+IaveW6tGkBwKsc8sNSo1EIyt/9zWrnJRJ1H0qT1RZ tabA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f14si19595063oto.46.2020.01.19.15.10.40; Sun, 19 Jan 2020 15:11:17 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728946AbgASXIP (ORCPT + 99 others); Sun, 19 Jan 2020 18:08:15 -0500 Received: from zeniv.linux.org.uk ([195.92.253.2]:39790 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728851AbgASXIO (ORCPT ); Sun, 19 Jan 2020 18:08:14 -0500 Received: from viro by ZenIV.linux.org.uk with local (Exim 4.92.3 #3 (Red Hat Linux)) id 1itJft-00BhLp-Jl; Sun, 19 Jan 2020 23:08:09 +0000 Date: Sun, 19 Jan 2020 23:08:09 +0000 From: Al Viro To: Pali =?iso-8859-1?Q?Roh=E1r?= Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, "Theodore Y. Ts'o" , OGAWA Hirofumi , Namjae Jeon , Gabriel Krisman Bertazi Subject: Re: vfat: Broken case-insensitive support for UTF-8 Message-ID: <20200119230809.GW8904@ZenIV.linux.org.uk> References: <20200119221455.bac7dc55g56q2l4r@pali> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20200119221455.bac7dc55g56q2l4r@pali> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Jan 19, 2020 at 11:14:55PM +0100, Pali Rohár wrote: > So when UTF-8 on VFS for VFAT is enabled, then for VFS <--> VFAT > conversion are used utf16s_to_utf8s() and utf8s_to_utf16s() functions. > But in fat_name_match(), vfat_hashi() and vfat_cmpi() functions is used > NLS table (default iso8859-1) with nls_strnicmp() and nls_tolower(). > > Which means that fat_name_match(), vfat_hashi() and vfat_cmpi() are > broken for vfat in UTF-8 mode. > > I was thinking how to fix it, and the only possible way is to write a > uni_tolower() function which takes one Unicode code point and returns > lowercase of input's Unicode code point. We cannot do any Unicode > normalization as VFAT specification does not say anything about it and > MS reference fastfat.sys implementation does not do it neither. Then how can that possibly be broken? If it matches the native behaviour, that's it. > As you can see lowercase 'd' and uppercase 'D' are same, but lowercase > 'č' and uppercase 'Č' are not same. This is because 'č' is two bytes > 0xc4 0x8d sequence and comparing is done by Latin1 table. 0xc4 is in > Latin 'Ä' which is already in uppercase. 0x8d is control char so is not > changed by tolower/toupper function. Again, who the hell cares? Does the behaviour match how Windows handles that thing? "Case" is not something well-defined; the only definition is "whatever weird crap does the native implementation choose to do". That's the only reason to support that garbage at all...