Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp2739887ybl; Mon, 20 Jan 2020 08:28:13 -0800 (PST) X-Google-Smtp-Source: APXvYqyy+cMsBrAnqVc+ErZoBvPdosQrwmGp/yCtXm7x7N0DwpMrbKHIso6rwnv4evDOtvj3Ie5O X-Received: by 2002:a9d:4d81:: with SMTP id u1mr153318otk.323.1579537693067; Mon, 20 Jan 2020 08:28:13 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1579537693; cv=none; d=google.com; s=arc-20160816; b=YzrEy0nbWnTltoinXZ3pKrSSloAIysohyRZ7SkMrXPm6PE8sJzLBuwzME+CSK6+2mD RroJv5t4Lf/4Bl+OG1FnCmF6exKYUuC0XL3UP+04ubn8C28FDYQoo+iCNTsJhhMUkYi/ xFtSv95nXLK5n4q64vxeYDYW8+sqOXlLXLwj1Cq054uyHaBFoKnEPAtrf3yhgbZkoQNm CTZw/Qj3BYD6d6AhOUtSDJZTuGif/vDdHXPSXz+qleE5OlGtlVEFBt086Dl1yH6u/nPa VPhUYARvvXY3jMaUEeWDjJxJTMTT/g+mEI36O7639KpHlK4TkWmNh5Gd9cU7KrNQMOJA uTHw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=C43kfEsLFxRZy2sK+w1qIwr4+TCPdvXSm3RH+wVd5qU=; b=fXp10ftHIpdlPnL+sRCiv12atKjZplTCh5hA5WBePqpacyfADE/i5VfKvvxEA/2TSH eqCkF1HHllArtTZVDpU4UK7jEw2QzjpRr/ABaa993pzVdJMg1lLHIBMXCrxQuTohXaKR eJvFLcZe3t1EbSI+yvFSAsmiFjNkuIy4U17Wx3QMYEo9UgW3epMEUk8KWjaIPOcXwdtx RBFHU56ESEbMBm6sl4wQ/1MQXBvVk4RToI8XLP4k3kaVQOewBCDdy2HMAuz0wKDFirn6 +K3/pJXEbQxCfq9lZw3PwzE+Fvz1z6bU2jtJSlnYu7mR4Gz+gGWsh8WQbwEHid7fnMto NB5Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=mOBZLXtl; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u16si21779958otg.325.2020.01.20.08.28.00; Mon, 20 Jan 2020 08:28:13 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=mOBZLXtl; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729253AbgATQ1F (ORCPT + 99 others); Mon, 20 Jan 2020 11:27:05 -0500 Received: from mail-wr1-f53.google.com ([209.85.221.53]:35565 "EHLO mail-wr1-f53.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726897AbgATQ1F (ORCPT ); Mon, 20 Jan 2020 11:27:05 -0500 Received: by mail-wr1-f53.google.com with SMTP id g17so111165wro.2; Mon, 20 Jan 2020 08:27:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=C43kfEsLFxRZy2sK+w1qIwr4+TCPdvXSm3RH+wVd5qU=; b=mOBZLXtlqfsLVlIR51FbTNLAtsf8o72Td2ObDsLY0Xvk4B35HFcU0PGjNo42BCMnb7 DQ2ygZ+HOQFTyzmPJ2xwCZ5W+lVyYAOGDFeIfxjZG+jyJGsi9TljdlUHpm1Mz1Ed0BWP 4k3ukTI1O3iEuHCDFLb0oVX3m0Q5+m9ZxSrYltgb6+9KIjp60+WoRN0m/Txw5M0jF8E0 aSOVuApB0s+J72PL325fy97uhYnT3v18qkM2zrHFuTtWvIjagRhWvTqqGGB1ARaIRK8r V3SVCXGCeWF1q6on5zWLJa0Dgxgma4DbLbtSZE8SYjk2NXaZRDAh1K4VA3mgEhxnrOEd AXLw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=C43kfEsLFxRZy2sK+w1qIwr4+TCPdvXSm3RH+wVd5qU=; b=jws0d4Peko9XZXBIOqpdPC5UEugHIgMb3G/OGdSLr1MtfJ2nqx5ifBkKuyEi+GRO8r 4CzBzDZUoky8cCZsXHTsIybUR++dv9fzIuopy23gqbiLhicIlvY6usUMLTbi1rwWITJ6 ezWrfxQEo2oFBcUSZqKQnDbW5LDSkgxiNaOeygn8BnXZhXDd8FAnu0hB1BG/J55H7041 w2VsvunntVg1+Hcakrt0afPYx28wzarO3SwSogOcg/ddxevw04fAZwTugvLpPgaB2bEp OBbDhbalu+cv5jKoiEehJY091M2Ydvi+9lkaysyFKkqYnq/PmaAoLogfPXt3z3JIScbv CvxA== X-Gm-Message-State: APjAAAUqLw9rv3O5roLIRpN1pJP7dxH6T14Gdnq+8H/M4QZI3lHOT2Mq 3WTq0epIMdysS8zNpPoa1iI= X-Received: by 2002:a05:6000:cb:: with SMTP id q11mr356257wrx.14.1579537623030; Mon, 20 Jan 2020 08:27:03 -0800 (PST) Received: from pali ([2a02:2b88:2:1::5cc6:2f]) by smtp.gmail.com with ESMTPSA id r6sm49550041wrq.92.2020.01.20.08.27.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Jan 2020 08:27:02 -0800 (PST) Date: Mon, 20 Jan 2020 17:27:01 +0100 From: Pali =?utf-8?B?Um9ow6Fy?= To: David Laight Cc: OGAWA Hirofumi , "linux-kernel@vger.kernel.org" , "linux-fsdevel@vger.kernel.org" , "Theodore Y. Ts'o" , Namjae Jeon , Gabriel Krisman Bertazi Subject: Re: vfat: Broken case-insensitive support for UTF-8 Message-ID: <20200120162701.guxcrmqysejaqw6y@pali> References: <20200119221455.bac7dc55g56q2l4r@pali> <87sgkan57p.fsf@mail.parknet.co.jp> <20200120110438.ak7jpyy66clx5v6x@pali> <89eba9906011446f8441090f496278d2@AcuMS.aculab.com> <20200120152009.5vbemgmvhke4qupq@pali> <1a4c545dc7f14e33b7e59321a0aab868@AcuMS.aculab.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="rb3yd7yh2odisgq3" Content-Disposition: inline In-Reply-To: <1a4c545dc7f14e33b7e59321a0aab868@AcuMS.aculab.com> User-Agent: NeoMutt/20180716 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --rb3yd7yh2odisgq3 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Monday 20 January 2020 15:47:22 David Laight wrote: > From: Pali Roh=C3=A1r > > Sent: 20 January 2020 15:20 > ... > > This is not possible. There is 1:1 mapping between UTF-8 sequence and > > Unicode code point. wchar_t in kernel represent either one Unicode code > > point (limited up to U+FFFF in NLS framework functions) or 2bytes in > > UTF-16 sequence (only in utf8s_to_utf16s() and utf16s_to_utf8s() > > functions). >=20 > Unfortunately there is neither a 1:1 mapping of all possible byte sequenc= es > to wchar_t (or unicode code points), I was talking about valid UTF-8 sequence (invalid, illformed is out of game and for sure would always cause problems). > nor a 1:1 mapping of all possible wchar_t values to UTF-8. This is not truth. There is exactly only one way how to convert sequence of Unicode code points to UTF-8. UTF is Unicode Transformation Format and has exact definition how is Unicode Transformed. If you have valid UTF-8 sequence then it describe one exact sequence of Unicode code points. And if you have sequence (ordinals) of Unicode code points there is exactly one and only one its representation in UTF-8. I would suggest you to read Unicode standard, section 2.5 Encoding Forms. > Really both need to be defined - even for otherwise 'invalid' sequences. >=20 > Even the 16-bit values above 0xd000 can appear on their own in > windows filesystems (according to wikipedia). If you are talking about UTF-16 (which is _not_ 16-bit as you wrote), look at my previous email: "MS FAT32 implementations allows half of UTF-16 surrogate pair stored in FS= =2E" > It is all to easy to get sequences of values that cannot be converted > to/from UTF-8. >=20 > David >=20 > - > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1= 1PT, UK > Registration No: 1397386 (Wales) --=20 Pali Roh=C3=A1r pali.rohar@gmail.com --rb3yd7yh2odisgq3 Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQS4VrIQdKium2krgIWL8Mk9A+RDUgUCXiXU0wAKCRCL8Mk9A+RD UlffAJ0elMMTIUY0wtAoDO7B5Dqo/pfzcwCdFon3xWqzyaeLu9BTsknYE0wNmjE= =ZCx6 -----END PGP SIGNATURE----- --rb3yd7yh2odisgq3--