Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 22518C433F5 for ; Thu, 9 Dec 2021 22:07:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233250AbhLIWL3 (ORCPT ); Thu, 9 Dec 2021 17:11:29 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59202 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233204AbhLIWL2 (ORCPT ); Thu, 9 Dec 2021 17:11:28 -0500 Received: from mail-ed1-x534.google.com (mail-ed1-x534.google.com [IPv6:2a00:1450:4864:20::534]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CBBEAC0617A1 for ; Thu, 9 Dec 2021 14:07:54 -0800 (PST) Received: by mail-ed1-x534.google.com with SMTP id x15so24707265edv.1 for ; Thu, 09 Dec 2021 14:07:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=RHArNxFMHgs9P3JmMQUjRcZVsX1YeVD3wW/DAeZt5nA=; b=JhUfOUJjxhGRzHwksR/Q8zG3IINBIKaphSkRwJzWYADLahSsN6d7219QvbpoEd8Y7k Ufk/H24zbV7UnfRzWnZtmEkNZrNUh/75SD5YVhHcTNkE8nvjcjbuV/8uU1FIxASJeztH 4gA4ykWichWHKmWefAIY/la0fzYo0je1n6xY0= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=RHArNxFMHgs9P3JmMQUjRcZVsX1YeVD3wW/DAeZt5nA=; b=CxEMcsdXaAecRvYqRqbvsQ9QQDkybZkaXGMrj1rGdAWYkCLX0i73v6ly0NnVRBEW70 VSLzGDtFP4SneNEY647jPK94PwMyxRlcBYF/WnTs/LsZTvXXEzCvMI9XXhjUGqWg+sLl JaWY0zynlTEEPq+deHY0VTKObSJO8/LwKB2l0RvGKSsx47Qio3bYl/XlcknzMNdbNjKa a89oEEVXXHLcqYB/wgvDV5qU6P4Yk1CyQ72XHx3x+289jLhZ0MyPky/9KXxcsyZhqrph fycBW+HAF/fBckTIPGGpDYXSCkqzn09FC0DOBVfQtwV7qH8lmomBXxUVM50pUe2b5Rbu Ljag== X-Gm-Message-State: AOAM531hRjERVL0s/xSLSZF3RxDICuH9UKNp1GxKD4Ii2hWla36U4zGw N/dX55Hjut1OOTAJyKHyTTGnJ4AyN5ZcBOeRICM= X-Google-Smtp-Source: ABdhPJyl/R47shGdv+vraApv5kfhlacBHbpfLK/+B6tTWkZVGM1YrwjWXnE7IlOfvSqOVl5G2M2zCw== X-Received: by 2002:a17:906:9754:: with SMTP id o20mr18336901ejy.277.1639087673219; Thu, 09 Dec 2021 14:07:53 -0800 (PST) Received: from mail-wr1-f41.google.com (mail-wr1-f41.google.com. [209.85.221.41]) by smtp.gmail.com with ESMTPSA id d19sm478582edt.34.2021.12.09.14.07.49 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 09 Dec 2021 14:07:51 -0800 (PST) Received: by mail-wr1-f41.google.com with SMTP id u17so12119307wrt.3 for ; Thu, 09 Dec 2021 14:07:49 -0800 (PST) X-Received: by 2002:adf:f8c3:: with SMTP id f3mr9535285wrq.495.1639087669394; Thu, 09 Dec 2021 14:07:49 -0800 (PST) MIME-Version: 1.0 References: <163906878733.143852.5604115678965006622.stgit@warthog.procyon.org.uk> <163906888735.143852.10944614318596881429.stgit@warthog.procyon.org.uk> <159180.1639087053@warthog.procyon.org.uk> In-Reply-To: <159180.1639087053@warthog.procyon.org.uk> From: Linus Torvalds Date: Thu, 9 Dec 2021 14:07:33 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v2 07/67] fscache: Implement a hash function To: David Howells Cc: linux-cachefs@redhat.com, Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet , Jeff Layton , Matthew Wilcox , Alexander Viro , Omar Sandoval , JeffleXu , linux-afs@lists.infradead.org, "open list:NFS, SUNRPC, AND..." , CIFS , ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel , Linux Kernel Mailing List Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Dec 9, 2021 at 1:57 PM David Howells wrote: > > What I'm trying to get at is that the hash needs to be consistent, no matter > the endianness of the cpu, for any particular input blob. Yeah, if that's the case, then you should probably make that "unsigned int *data" argument probably just be "void *" and then: > a = *data++; <<<<<<< > HASH_MIX(x, y, a); > } > return fold_hash(x, y); > } > > The marked line should probably use something like le/be32_to_cpu(). Yes, it should be using a '__le32 *' inside that function and you should use l32_to_cpu(). Obviously, BE would work too, but cause unnecessary work on common hardware. But as mentioned for the other patches, you should then also be a lot more careful about always using the end result as an 'unsigned int' (or maybe 'u32') too, and when comparing hashes for binary search or other, you should always do th4e compare in some stable format. Because doing return (long)hash_a - (long)hash_b; and looking at the sign doesn't actually result in a stable ordering on 32-bit architectures. You don't get a transitive ordering (ie a < b and b < c doesn't imply a < c). And presumably if the hashes are meaningful across machines, then hash comparisons should also be meaningful across machines. So when comparing hashes, you need to compare them either in a truly bigger signed type (and make sure that doesn't get truncated) - kind of like how a lot of 'memcmp()' functions do 'unsigned char' subtractions in an 'int' - or you need to compare them _as_ 'unsigned int'. Otherwise the comparisons will be all kinds of messed up. Linus