Received: by 2002:a05:7412:40d:b0:e2:908c:2ebd with SMTP id 13csp109026rdf; Mon, 20 Nov 2023 18:29:36 -0800 (PST) X-Google-Smtp-Source: AGHT+IHMFQr0eL+DyqOdST20u1fE2WhEFJTACySWaLgmp1YmFdvRgj91VamNsRlHxmnLFc26t6DP X-Received: by 2002:a05:6a20:9188:b0:187:858b:8c0a with SMTP id v8-20020a056a20918800b00187858b8c0amr12274810pzd.11.1700533776521; Mon, 20 Nov 2023 18:29:36 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700533776; cv=none; d=google.com; s=arc-20160816; b=u6URtLlf6jh3MDwBSyGB0ufOI2Ulbvj2eqVGH+it4i2Wd/TIIxnDxEkMKOWJHlos+V hFD5W+iIgcIzw1FhqeY2V583g4T+9sqwRHJj1yIStXrEJUEHVWREvhhFjymk5DyLDlC1 LXRoqtWWMNG59UCmUA7SCPin2hs6rqYEGlnAiAI91/Oj756He+FCZ7wv2x0syN2J8b57 pA5+vKSIuW7VjG4PT2ooSbuOBJ1fMjTHJQGd6nX7ORJbzDd96BYvzSB9ga2/hx3CoJp0 e/jDsOOcnGIPg8GBVK8T636tvn4uVXjMElSa7zrh6R2y0BXC81xZ8wHZGS8cb3eRgb7a CFNg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:list-unsubscribe:list-subscribe:list-id:precedence :dkim-signature; bh=cXgJaCnWAfzXRMpd5bvFRcG7YxYfu713J7p+TxuYhXc=; fh=kwAPz8iMYakWNjMk6nFJL+2OL6aGFdLFl3dkEmxsXKw=; b=Ez7NQbf9e+ZqSRSRxvuuiTAoXVJ95TOwfeoylCIgxThDfJn4Fv6JY0swpbNIYZibyL HdfdpVqPZq6t0LjMklVFryEm6l6DnVhzJl3jl/bnhx94XMZqynP/rD2zeGN2obFAAofm n6nrQQlwd3DleD04Z7n/w0IdIXV0nXhVmRGvYiPw+EwOVBXi9A+7YybA7uFLM+e0N49K m5r4AmcPMrohkWgsGVfi5DGJpxK26UwcRioAV1Piv+YZcOEYiGIPyMC5IaEVOI2f234N BXIz67qxUtMeRVWMq8FvZkyW8qpN7TJMer5mhBP3+e7FQQWI+KyZIlDbDmsL0TKeen8j cuUg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=CabrB+bJ; spf=pass (google.com: domain of linux-ext4+bounces-58-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-ext4+bounces-58-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id f10-20020a6547ca000000b005b32ca3f714si8874079pgs.718.2023.11.20.18.29.36 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Nov 2023 18:29:36 -0800 (PST) Received-SPF: pass (google.com: domain of linux-ext4+bounces-58-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=CabrB+bJ; spf=pass (google.com: domain of linux-ext4+bounces-58-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-ext4+bounces-58-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 2FDA5281AED for ; Tue, 21 Nov 2023 02:29:36 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 34A78154B6; Tue, 21 Nov 2023 02:29:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux-foundation.org header.i=@linux-foundation.org header.b="CabrB+bJ" X-Original-To: linux-ext4@vger.kernel.org Received: from mail-lf1-x135.google.com (mail-lf1-x135.google.com [IPv6:2a00:1450:4864:20::135]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AB3E4C8 for ; Mon, 20 Nov 2023 18:29:26 -0800 (PST) Received: by mail-lf1-x135.google.com with SMTP id 2adb3069b0e04-507f1c29f25so6846475e87.1 for ; Mon, 20 Nov 2023 18:29:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; t=1700533765; x=1701138565; darn=vger.kernel.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=cXgJaCnWAfzXRMpd5bvFRcG7YxYfu713J7p+TxuYhXc=; b=CabrB+bJdLs0bftPSxVxwuPqQ+FSfn/GQSqSDP2pzGMvSNJf4yLBzZBRP7HKroo+oO DkrKFL+D/Ytrz2403ko3Sm1MfUxSMmMP3beLHecoTrIe3w6IaQy6n2w6FHADpCkawW+7 sd42PsqCzluGGONb8BbkYgRAkKU9wuuf58UxM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700533765; x=1701138565; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=cXgJaCnWAfzXRMpd5bvFRcG7YxYfu713J7p+TxuYhXc=; b=D7tnCz3w2VNYKU9+9tKc0y6Je8Wz8scAFR2wqy0NVYmYjPi4Ef55oTDt4gd9Fs/Ks/ B00ecJj6DtwoJJuPGQVla1CURh/dWttB+lvCDBU2B8M6oCDONJNelUH4d1WBGqgkaf3Z VvdX21hPhaUR0IW7BDQCAnKqm/S5sEbnDHM9eVd+lGhreMScKVKhOY3Z79x7WHsdInlb ydp8r9m5UG61YiIXdcQ7yWmy0LUyD1xHCWhNyUTnQqTgp2GetMkPfnOI/q47/c3zChtf Zq8lj2vUY0z5DG/II+lyWwyFOuBvT6i/nwProA0P7O/1oqV5BXUuIiuH/Bj3jy+fOYw8 hvjQ== X-Gm-Message-State: AOJu0YxFQO2ZCCGSwBQWHlqsMqXqmDbZuQ/Hk4KSNFxYUHtSUDUz1/tq d0EBhH36RQj02SvkTbD4LkeGoeFmx/BUvVFjEG659A== X-Received: by 2002:a19:6747:0:b0:508:12f5:f808 with SMTP id e7-20020a196747000000b0050812f5f808mr6504052lfj.57.1700533764675; Mon, 20 Nov 2023 18:29:24 -0800 (PST) Received: from mail-ed1-f52.google.com (mail-ed1-f52.google.com. [209.85.208.52]) by smtp.gmail.com with ESMTPSA id lh25-20020a170906f8d900b009fd50aa6984sm2162401ejb.83.2023.11.20.18.29.23 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 20 Nov 2023 18:29:23 -0800 (PST) Received: by mail-ed1-f52.google.com with SMTP id 4fb4d7f45d1cf-5437d60fb7aso7351165a12.3 for ; Mon, 20 Nov 2023 18:29:23 -0800 (PST) X-Received: by 2002:a50:9ec2:0:b0:542:e844:5c9b with SMTP id a60-20020a509ec2000000b00542e8445c9bmr836227edf.13.1700533762984; Mon, 20 Nov 2023 18:29:22 -0800 (PST) Precedence: bulk X-Mailing-List: linux-ext4@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20230816050803.15660-1-krisman@suse.de> <20231025-selektiert-leibarzt-5d0070d85d93@brauner> <655a9634.630a0220.d50d7.5063SMTPIN_ADDED_BROKEN@mx.google.com> <20231120-nihilismus-verehren-f2b932b799e0@brauner> <20231121020254.GB291888@mit.edu> In-Reply-To: <20231121020254.GB291888@mit.edu> From: Linus Torvalds Date: Mon, 20 Nov 2023 18:29:05 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [f2fs-dev] [PATCH v6 0/9] Support negative dentries on case-insensitive ext4 and f2fs To: "Theodore Ts'o" Cc: Christian Brauner , Gabriel Krisman Bertazi , viro@zeniv.linux.org.uk, linux-f2fs-devel@lists.sourceforge.net, ebiggers@kernel.org, linux-fsdevel@vger.kernel.org, jaegeuk@kernel.org, linux-ext4@vger.kernel.org Content-Type: text/plain; charset="UTF-8" On Mon, 20 Nov 2023 at 18:03, Theodore Ts'o wrote: > > On Mon, Nov 20, 2023 at 10:07:51AM -0800, Linus Torvalds wrote: > > I'm looking at things like > > generic_ci_d_compare(), and it hurts to see the mindless "let's do > > lookups and compares one utf8 character at a time". What a disgrace. > > Somebody either *really* didn't care, or was a Unicode person who > > didn't understand the point of UTF-8. > > This isn't because of case-folding brain damage, but rather Unicode > brain damage. No, it really is just stupidity and horribleness. The thing is, when you check two strings for equality, the FIRST THING you should do is to just compare them for exactly that: equality. And no, the way you do that is not by checking each unicode character one by one. You do it by just doing a regular memcmp. In fact, you can do even better than that: while at it, check whether (a) all bytes are equal in everything but bit#5 (b) none of the bytes have the high bit set and you have now narrowed down things in a big way. You can do these things trivially one whole word at a time, and you'll handle 99% of all input without EVER doing any Unicode garbage AT ALL. Yes, yes, if you actually have complex characters, you end up having to deal with that mess. But no, that is *not* an excuse for saying "all characters are complex". So no. There is absolutely zero excuse for doing stupid things, except for "nobody has ever cared, because case folding is so stupid to begin with that people just expect it to perform horribly badly". End result: - generic_ci_d_compare() should *not* consider the memcmp() to be a "fall back to this for non-casefolded". You should start with that, and if the bytes are equal then the strings are equal. End of story. - if the bytes are not equal, then the strings *might* still compare equal if it's a casefolded directory. - but EVEN THEN you shouldn't fall back to actually doing UTF-8 decoding unless you saw the high bit being set at some point. - and if they different in anything but bit #5 and you didn't see the high bit, you know they are different. It's a bit complicated, yes. But no, doing things one unicode character at a time is just bad bad bad. Linus