Received: by 2002:a25:e7d8:0:0:0:0:0 with SMTP id e207csp427826ybh; Wed, 18 Mar 2020 02:33:25 -0700 (PDT) X-Google-Smtp-Source: ADFU+vuRgaNf0wQPoI4wuYtKCmFjWtjDQr8M0sbAeFztGNNlH/VFYSVP89l9tc1OXAq+WafegLMR X-Received: by 2002:aca:df06:: with SMTP id w6mr2457152oig.2.1584524004951; Wed, 18 Mar 2020 02:33:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1584524004; cv=none; d=google.com; s=arc-20160816; b=hlAGpdrVncODmyMxV3cLVjA775O/ju4WDbIGZodVkp6onU3xfIj6yrweUD71JKzHxM 12bCOZH70ADkht4D5KOFVZos7UGLob4YAHkdTyyqFQXzh8TmPZ6Qm2XFnB9X24NVVYzI 8OiZ83nVifqxRWQXs5FlpkwzEii5nM/ggkY64KEzjQhXB7Nwl6jYC2mnSTG/8VBfjwzX YWCSBt+eDLUa2WiGv1FNs4kwOlKlsQKA4xVkSglG13Mb+C730caW/ki7uiMCVnL1wh6+ SNXBj7QZ3ACQiq6oflkYvdRxs80BFydft4L9DeXCjJBKamCGfC49+WDGK1x5+OB+0rsl roRA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=5DIse3Rs0a4FPRNUeroFGpLzV3SyrcjRWoIjWoHbD/Q=; b=k25vHpLXE9Vg8cbBKlmn+PRMXB+3ie8czVeiCe3BwSn8yutBOlNTGnAq6sH6W3aCSp z03i9we+DRXMhyBz2ddLTaY/OL7lRUR74G+Vh5uPVpJWv4dWg5Dw6615aLvq6GiCww+Y t9G3VaaPGV2g1R/lDGzfsgFwrRbNPeR+ADWJYXp13vyalNqUYIRQygMauzUKUK68lZgd qnEEl2j2wwgubTfMk/agSDqE9fR+o4/sM1HJ8IRTzvS3P1CChRaj5WfNZuFndXozCKhF NkjyWcmj2spuUNnXnPPYdtN7pU3ElWUR3GSBvauWa1JDBwqpelFNFtUDI6uCH2jN2lSk qNyg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=qQXqchgX; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h4si2951659otr.51.2020.03.18.02.33.12; Wed, 18 Mar 2020 02:33:24 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=qQXqchgX; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727552AbgCRJcz (ORCPT + 99 others); Wed, 18 Mar 2020 05:32:55 -0400 Received: from mail.kernel.org ([198.145.29.99]:42450 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727535AbgCRJcy (ORCPT ); Wed, 18 Mar 2020 05:32:54 -0400 Received: from pali.im (pali.im [31.31.79.79]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 181552076E; Wed, 18 Mar 2020 09:32:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1584523974; bh=7ZXVlKZVhCdJX0cIF263k6eSNbp5mg7ptfCnuopeyQM=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=qQXqchgXR15tr97xOsmYF1mKwEAsrmQu35AyGaG89v5CVKeqLI716gSWVY6M/e6j5 p93hW9EoKQheqeZ5Tnx+OioPw50JO1SxiP2RGm5a2LsEJfzy5u64jMIft8dWyOlucY ZGPjAK71gUgxZvBEBsxKRtyyZg3towc7MX0pSk94= Received: by pali.im (Postfix) id E96A176E; Wed, 18 Mar 2020 10:32:51 +0100 (CET) Date: Wed, 18 Mar 2020 10:32:51 +0100 From: Pali =?utf-8?B?Um9ow6Fy?= To: Al Viro Cc: Namjae Jeon , Sungjong Seo , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 1/4] exfat: Simplify exfat_utf8_d_hash() for code points above U+FFFF Message-ID: <20200318093251.bgxd3l5om4zlm3br@pali> References: <20200317222555.29974-1-pali@kernel.org> <20200317222555.29974-2-pali@kernel.org> <20200318000925.GB23230@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20200318000925.GB23230@ZenIV.linux.org.uk> User-Agent: NeoMutt/20180716 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wednesday 18 March 2020 00:09:25 Al Viro wrote: > On Tue, Mar 17, 2020 at 11:25:52PM +0100, Pali Rohár wrote: > > Function partial_name_hash() takes long type value into which can be stored > > one Unicode code point. Therefore conversion from UTF-32 to UTF-16 is not > > needed. > > Hmm... You might want to update the comment in stringhash.h... Well, initially I have not looked at hashing functions deeply. Used hashing function in stringhash.h is defined as: static inline unsigned long partial_name_hash(unsigned long c, unsigned long prevhash) { return (prevhash + (c << 4) + (c >> 4)) * 11; } I guess it was designed for 8bit types, not for long (64bit types) and I'm not sure how effective it is even for 16bit types for which it is already used. So question is, what should we do for either 21bit number (one Unicode code point = equivalent of UTF-32) or for sequence of 16bit numbers (UTF-16)? Any opinion?