Received: by 2002:a25:c205:0:0:0:0:0 with SMTP id s5csp3240543ybf; Tue, 3 Mar 2020 02:14:40 -0800 (PST) X-Google-Smtp-Source: ADFU+vuxGqDX1sFzpYhoggI9FDu4Nym8ty7TP0p3JQZVyGGydhmLwhxSJRHu8OSMMbk4kEeoE18E X-Received: by 2002:aca:ebcf:: with SMTP id j198mr1866554oih.115.1583230480127; Tue, 03 Mar 2020 02:14:40 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1583230480; cv=none; d=google.com; s=arc-20160816; b=X9MzGV+BkVaf4qQ8uNFcm0S0zauy2WGO85CUG/DjYlBmcqR/eBolCBXmPYNRc79P2q qi1AnL32JdDX5BJ5YFKCFYwwGbb1ALr1dmXG2fNwXZ5Q4zI+zJfkP1QhWcZgGW1s1VHo P6oLiDy0X1p+bCUWvo7X8vC1/veU/zDH7t3eFG2cNOnkfKRuJEaqt8plo1+/zkZ3MScg cvJ3IMFnvgjSY9fuOZp0kD6mIccnt0h8DI7nPsW/WqYjvca3KSxAPa4Ywj1JpZNSLqXk /7ENPDiyvOqs4jT7iiuHunr7XrLWJiyj1fJG+MlHd0lKv3H6hr5npnNt+qJDFKs5pvi5 vw2w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=HZLJdyompUEUfGaYkKC4//Srq4/t18wcP+yf/P65E7A=; b=cnwMM0s/lkch41UEh45BcIYbdXp0EMlQUAGemkDBRSzVjSdtwXrgze7V8ypnhOGR+j Mkd2Vn+MO9lmpKVOcgUkr54OSHkdmRTuF6pepHX2tQBzI3HqADfXX2Ad6enlfyVqR7X+ dVzxpaPI0ONHnhhNoOkHn2HnA76apYajbu3LyQp3VZZx1w3y3uBFD4r3Mjix5Rr7En2Y O03OCK1HmLJ1OPc8HKDXiDTKnInbzzaXcYh9xO7VPw/mrk5b8m/Opfarar6b7T46+YMI 87hIFGsXX1NAVfMu4cWxxKdkjrAbqvLOpEj4Oo+jB/e7A+R7QvrtR4jpi4Gmvb5L8KyA SfWA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@mirlab-org.20150623.gappssmtp.com header.s=20150623 header.b=a5uUjNdk; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y14si3839516oia.61.2020.03.03.02.14.27; Tue, 03 Mar 2020 02:14:40 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@mirlab-org.20150623.gappssmtp.com header.s=20150623 header.b=a5uUjNdk; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728473AbgCCKOL (ORCPT + 99 others); Tue, 3 Mar 2020 05:14:11 -0500 Received: from mail-lf1-f65.google.com ([209.85.167.65]:40936 "EHLO mail-lf1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728102AbgCCKOL (ORCPT ); Tue, 3 Mar 2020 05:14:11 -0500 Received: by mail-lf1-f65.google.com with SMTP id p5so2221092lfc.7 for ; Tue, 03 Mar 2020 02:14:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mirlab-org.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=HZLJdyompUEUfGaYkKC4//Srq4/t18wcP+yf/P65E7A=; b=a5uUjNdkPvsrGH3XeHiv1khDvumuDRVBlNJFC6lUcfnUmuEr9PE65vkxV2uphGI8I+ 3UDxl8sCAJQepm6KrwdpYMCFAy4EhMEfbAtQadb5vks/qYfRrfHpQmiOnHv3zLQHSL64 LsKSugPrYvGKzFfo8Wcxx57HXmjc6Rh5nN3sM0/Vc7MX01beFpdLGSt6E72oLxzAtRQo 4hbRPYZQs8ryVm7Mxw3O7gto5lZutkgVkcNa7CFh/DpvT8w2plqfGGrWnnV4TP8cgK6O ET/tJ71P9YBvAv1wQtLs9YjEUD7/VNYhhflzk+EiIRITC1vrifpAWn3tuqabLLlztiXE 2WeQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=HZLJdyompUEUfGaYkKC4//Srq4/t18wcP+yf/P65E7A=; b=B9O/9hPeWbwm9KouIeBjbBSzHOIokYzQYg6LitFFEbVrM8qJ8Vwr548Oam8ECuLfCQ M/nUQ3GnlBHz1edIOcZF93j/6ikicUAiLNtmv4RszZqrqb0KGdKgkmhiS6Q4dbqKDTSW Bbcy7X3uCLNYB8EuLzOrVYyv5WSYzZUdSsurzsAj03qfny7bXOJacReTUGzbxqr1Le9v xNGr+1ouZIB+R6PFP05VloMMk5/NsZ7dp8s9ZExYHgAcPok/RZyxrWN3DEKrO83Kscc/ RC1VyBUtuThIVJ3aagdTrWa4LJWXf861uxZWWFoeZCFi+Inv7/I24tFCbR/giqObpEkq 6t/A== X-Gm-Message-State: ANhLgQ3syNjOIKiIpHIUUIQzkL5uHK9X/QFx4KK/n+xSiKxvLhC1ALLB Jc7B+kq+E6KgwnlXhT43xuDHOlun7FEpIoYO+Elu3g== X-Received: by 2002:ac2:5699:: with SMTP id 25mr2373610lfr.54.1583230449412; Tue, 03 Mar 2020 02:14:09 -0800 (PST) MIME-Version: 1.0 References: <20200302103754.nsvtne2vvduug77e@yavin> <20200302104741.b5lypijqlbpq5lgz@yavin> <20200303070928.aawxoyeq77wnc3ts@yavin> In-Reply-To: <20200303070928.aawxoyeq77wnc3ts@yavin> From: lampahome Date: Tue, 3 Mar 2020 18:13:56 +0800 Message-ID: Subject: Re: why do we need utf8 normalization when compare name? To: Aleksa Sarai Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > Unicode normalisation will take the strings "=C3=B1" (U+00F1) and "n=E2= =97=8C=CC=83" > (U+006E U+0303) and turn them into the same Unicode string. Note that > there are four kinds of Unicode normalisation (NFD, NFC, NFKD, NFKC), so > what precise string you end up with depends on which form you're using. > Linux uses NFD, I believe. > And yes, once the strings are normalised and encoded as UTF-8 you then > do a byte-by-byte comparison (if the comparison is case-insensitive then > fs/unicode/... will case-fold the Unicode symbols during normalisation). > What I'm confused is why encoded as utf-8 after normalize finished? From above, turn "=C3=B1" (U+00F1) and "n=E2=97=8C=CC=83" (U+006E U+0303) i= nto the same Unicode string. Then why should we just compare bytes from normalized.