Received: by 2002:a05:7412:40d:b0:e2:908c:2ebd with SMTP id 13csp159483rdf; Mon, 20 Nov 2023 21:12:55 -0800 (PST) X-Google-Smtp-Source: AGHT+IFZ2MK7VLB2UhnWsBlYTeHCL/eRqBb967RYHSdkYd7DnTgKNnLODDZcExcI8Y76OdNruFVL X-Received: by 2002:aca:1c0c:0:b0:3b6:a8cb:1ecb with SMTP id c12-20020aca1c0c000000b003b6a8cb1ecbmr10658756oic.40.1700543575075; Mon, 20 Nov 2023 21:12:55 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700543575; cv=none; d=google.com; s=arc-20160816; b=DoX6EEl7OQ+my04QceQgMLEuZE07KYbciyD+erXUz3DqLJhwA2MOxRclTn+0WWW+jd NLH7Q25pNK4cfhqRR1VbCKiJL/B2n8h0SX37uznANHoYpN+lJJABx8/R+d1gy8Y0rD5y Pdc/6GZVo4BCFPkBTdHk/SvrIPsPn0U2WmLbh1B2cJIbQ7c7EJhb+4PGBbiwqCy+DxeF RJIiR34ASklXvbR1SnZTTjI3tnKb+9ChEaHfg6X3T9j+z/vE+TMZATAV3NImEieInjzL ueBGFz6D75NK0csNhMBL0x6MY0ldrgkBs1rKw+DABFJyJnwtT6bz5MEPNsu2B8SrDBNe 4exg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date:dkim-signature; bh=6uSNTF4gBA2mBMIlInFqz++irtP7bRP60ummCQhH030=; fh=DEuNfPpkUwIjEMxdRA6QhbtTNzwVLYNKd62z+DCqqm0=; b=htImrYuvZsEfvz6zC9UHtMbfBE5/UO8x+OPLeqJFK3HOuIc3o2OmKF2jQPkITTyddm w2mRg4vQ2bTgakgvV2cJA13iPgjTw56Pbev8N2wDKl/N0RNXs+IYxlAIB5y3NyZ5hhY5 n85ydUSwqpZPuCKayfjH5q9uDk54euwPeo3AxJ1GwSpX2xOHuxR0sXGwmhHQ6SWqmtcP oJGuVxAuL6mGOh5sORNR4r2NZF5lGCQTQK0xcongPKJcxue0jdGglhUbynOLXF/Ikxuy tKxwU34+hBMByvUh3EfrHOdRyGo6bmgWULWpPXZuBlzn9sRmnixVPzPXd2pEHuwuWuCa RJIA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@mit.edu header.s=outgoing header.b=Um42NUha; spf=pass (google.com: domain of linux-ext4+bounces-62-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-ext4+bounces-62-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=mit.edu Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id c19-20020a630d13000000b005b99ea783aasi9035905pgl.755.2023.11.20.21.12.54 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Nov 2023 21:12:55 -0800 (PST) Received-SPF: pass (google.com: domain of linux-ext4+bounces-62-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@mit.edu header.s=outgoing header.b=Um42NUha; spf=pass (google.com: domain of linux-ext4+bounces-62-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-ext4+bounces-62-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=mit.edu Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id A5FC5282890 for ; Tue, 21 Nov 2023 05:12:54 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id D1E98171CA; Tue, 21 Nov 2023 05:12:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=mit.edu header.i=@mit.edu header.b="Um42NUha" X-Original-To: linux-ext4@vger.kernel.org Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 90E3CC8 for ; Mon, 20 Nov 2023 21:12:46 -0800 (PST) Received: from cwcc.thunk.org (pool-173-48-82-21.bstnma.fios.verizon.net [173.48.82.21]) (authenticated bits=0) (User authenticated as tytso@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 3AL5CFRR025396 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 21 Nov 2023 00:12:16 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mit.edu; s=outgoing; t=1700543538; bh=6uSNTF4gBA2mBMIlInFqz++irtP7bRP60ummCQhH030=; h=Date:From:Subject:Message-ID:MIME-Version:Content-Type; b=Um42NUhanFatTa+1jqPuCLUYty2ATWUCBuWm8YpeFYYKfK5UUs0HKRpVD4imsmJNV x7JOR/FgtbjoZsfqxykX2b3XgR32OWQ5PINc9aUIuzmXLZG0Irnb0SYI51LeD9enjy UrJBL3wN0yyOoFm2B4ttvmgRnqk5rEp1xeJyacx0igObUTwOALdAuqQhgTJ1P8Qrqj qiG0W2d9z+yVrsAhmyEw3XAbftuTPO/mdMxN8aKpTkUi7/BKVT44zqjPRR7/5l8qUh GzZ87cv+wdd4RYxhZVvVE7MqT2Mr+yDtLALuqxJhLZroOVEeSrg7ZGMd1hiECK7acM Uanm2L3MmZy/A== Received: by cwcc.thunk.org (Postfix, from userid 15806) id 2024415C02B0; Tue, 21 Nov 2023 00:12:15 -0500 (EST) Date: Tue, 21 Nov 2023 00:12:15 -0500 From: "Theodore Ts'o" To: Linus Torvalds Cc: Christian Brauner , Gabriel Krisman Bertazi , viro@zeniv.linux.org.uk, linux-f2fs-devel@lists.sourceforge.net, ebiggers@kernel.org, linux-fsdevel@vger.kernel.org, jaegeuk@kernel.org, linux-ext4@vger.kernel.org Subject: Re: [f2fs-dev] [PATCH v6 0/9] Support negative dentries on case-insensitive ext4 and f2fs Message-ID: <20231121051215.GA335601@mit.edu> References: <20230816050803.15660-1-krisman@suse.de> <20231025-selektiert-leibarzt-5d0070d85d93@brauner> <655a9634.630a0220.d50d7.5063SMTPIN_ADDED_BROKEN@mx.google.com> <20231120-nihilismus-verehren-f2b932b799e0@brauner> <20231121020254.GB291888@mit.edu> Precedence: bulk X-Mailing-List: linux-ext4@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Mon, Nov 20, 2023 at 07:03:13PM -0800, Linus Torvalds wrote: > On Mon, 20 Nov 2023 at 18:29, Linus Torvalds > wrote: > > > > It's a bit complicated, yes. But no, doing things one unicode > > character at a time is just bad bad bad. > > Put another way: the _point_ of UTF-8 is that ASCII is still ASCII. > It's literally why UTF-8 doesn't suck. > > So you can still compare ASCII strings as-is. > > No, that doesn't help people who are really using other locales, and > are actively using complicated characters. > > But it very much does mean that you can compare "Bad" and "bad" and > never ever look at any unicode translation ever. Yeah, agreed, that would be a nice optimization. However, in the unfortunate case where (a) it's non-ASCII, and (b) the input string is non-normalized and/or differs in case, we end up scanning some portion of the two strings twice; once doing the strcmp, and once doing the Unicode slow path. That being said, given that even in the case where we're dealing with non-ASCII strings, in the fairly common case where the program is doing a readdir() followed by a open() or stat(), the filename will be byte-identical and so a strcmp() will suffice. So I agree that it's a nice optimization. It'd be interesting how much such an optimization would actually show up in various benchmarks. It'd have to be something that was really metadata-heavy, or else the filenamea lookups would get drowned out. - Ted