Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp1564094pxb; Mon, 11 Oct 2021 08:32:32 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyZCh+U+lab4lgkIzCex0RTg4zd7ioIH49pDujZDernf9KDWT2esaPhniCnzitQS8TnWfw5 X-Received: by 2002:a05:6402:4410:: with SMTP id y16mr31750358eda.366.1633966352084; Mon, 11 Oct 2021 08:32:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1633966352; cv=none; d=google.com; s=arc-20160816; b=CapWHJP+TziAUUmOotxr/8wcGPN0iXJNEdl8nQ7jxzjNWXPRMmOltDc0J3ntUM5HqB VsJAt/YdxGQdGvYCMrKDkA+6N9v3xPBW8NxnSVgyu/hwcmVV08WEQeOzFad8NeCWnn8d mISfvidxX4z3cNbdawyeHaAHqUWRAudw2nJNmxcmb8r2MnS71F6r8pIXSAkdOORCv0eQ KTFocvmw6MxUxQPUTRhrTAQdHAjGnABuKJX5XU+gU81u8yzen0GKABjknJmO14WyOk+6 40L/N5MuSBpO/vpYpGjKOaarfgmJjgA5/0psva3Jg1OJ9EeJxQi2X4DHajA+v6Hmk0yL RI8w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:to:subject:message-id :date:from:in-reply-to:references:mime-version:dkim-signature; bh=aCpHVpWC34pJGJa6tpTEGjNdWXNTd1kl/nCjqNRhfxU=; b=YGzA1oOdm3Goqo7G/1Mjl6oNJRIEy2dYUhpweNEAoGPAAKlSVybyD1p4Jj+Jec1vaN 42RN/UD5wa/M2ZCwGskAFioumLQaoLKhxEEosL2jt4KfjsiXmVYn69+o8THgvkeAC1Ah PPc90Bx//TsNLc14EwxhWexdNlTR/L0Zyh05W7xVnE5E0dnUFGFvN//Ro8wmLGO5i8xo rURtTgoAH+GK3YgcM0PkeXe/OQ+6RLyoxOsWlLBb1J5X22TV8rQP719eppe8hfmbJj+y isZoftLHTrcStVO3iURDwXVS/O+1s4yCR3WNjZCgkmmK8EizZj6OFYG4yJjQ/XEPEltf ltOw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@deitcher-net.20210112.gappssmtp.com header.s=20210112 header.b=jSB2PK9f; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id ec43si6112925edb.43.2021.10.11.08.32.02; Mon, 11 Oct 2021 08:32:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@deitcher-net.20210112.gappssmtp.com header.s=20210112 header.b=jSB2PK9f; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237749AbhJKPcv (ORCPT + 99 others); Mon, 11 Oct 2021 11:32:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59330 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239042AbhJKPcu (ORCPT ); Mon, 11 Oct 2021 11:32:50 -0400 Received: from mail-ed1-x52a.google.com (mail-ed1-x52a.google.com [IPv6:2a00:1450:4864:20::52a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D9E77C061570 for ; Mon, 11 Oct 2021 08:30:49 -0700 (PDT) Received: by mail-ed1-x52a.google.com with SMTP id y12so56414024eda.4 for ; Mon, 11 Oct 2021 08:30:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=deitcher-net.20210112.gappssmtp.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :content-transfer-encoding; bh=aCpHVpWC34pJGJa6tpTEGjNdWXNTd1kl/nCjqNRhfxU=; b=jSB2PK9fdAQ4VgNRa/qrsEH3vtpTZO406IRIayetf+OYou0C4ouPdKJ+B0Ukimm2pa Q8sYkDg9d4BzXo+MjzTSeNQjslHgGhXYkfdN06/de9lj37Rgxo7BLWnUvOvOgR3HthW3 Qg8IwaDJcTKX+DSC4MSVxsFqHRMNVc1Rkg2naiDhmyDQXlCkp0yjCahaJGrXLON3OfCQ QW6VQWof1YRjBFF0d4MjmZo/bIkjBDMWd5ZHOzNebFizoQbf6Ig9W35vmpnlaskVgAgB LCsEVV1IIajnTR9g7OVNSXkea2U33FjigrHRUGb4KGH5m68Qr0tcLvmzthHjVVt9F2Ej zX7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:content-transfer-encoding; bh=aCpHVpWC34pJGJa6tpTEGjNdWXNTd1kl/nCjqNRhfxU=; b=fWqKi506B47X9lHBDVQxPyjrkvL6NVN2YS7t4cFwFBudggRxbzPCDmVCCkwQbu2VTw zCJYXtThVgWESzG/6ZasDLkctfvp4KdXgj+4UQxLLZepxE0SfHx8hX7EoPFhraeY/jFT eyIWiCyuZ3UaXpoY2Em1ahovPXn1GzTEOstil3EX7OxSW2wJyuVxUmJu37kGLKIDzJjv +jFTpwm9+AbR4vNXt8bM8saPYA8V+mkxuT1w4CYjcldzN41ccU7GrUfuopBFoOxXfRXB +u9vBQ//Kx2AbzTZrvwx9b0sT0X70qvfJYDlStzFz76bYXrD6OoTyw0OyWlqCpgGGqIe p/0g== X-Gm-Message-State: AOAM531tm9TH4g6hYzLQCihDxN0qx6Wd46A4BkDkaJ320fk0YgrmFvi8 V1CUWu2YUbpSUduma96dsSnt1cJqVdINpZ/8MSas/jB0+zBs/EPs X-Received: by 2002:a17:906:c009:: with SMTP id e9mr26624590ejz.509.1633966247796; Mon, 11 Oct 2021 08:30:47 -0700 (PDT) MIME-Version: 1.0 References: <3A493D20-568A-4D63-A575-5DEEBFAAF41A@dilger.ca> In-Reply-To: From: Avi Deitcher Date: Mon, 11 Oct 2021 08:30:36 -0700 Message-ID: Subject: Re: algorithm for half-md4 used in htree directories To: linux-ext4@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org Does someone know how this is constructed and used? On Mon, Oct 4, 2021 at 12:57 AM Avi Deitcher wrote: > > Hi Andreas, > > I had looked in __ext4fs_dirhash(). Yes, it does reference the seed - > and create a default if none is there at the filesystem level - but it > doesn't appear to use it, in that function. hinfo is populated in the > function - hash, minor-hash, seed - but it never uses the seed to > manipulate the hash. > > Are you saying that it is at a higher level? i.e. __ext4fs_dirhash() > is the *first* step, and there is further processing to get the actual > hash? I did walk up the stack, but couldn't figure out. > > Thanks for stepping in > Avi > > On Sun, Oct 3, 2021 at 7:43 PM Andreas Dilger wrote: > > > > On Oct 3, 2021, at 06:47, Avi Deitcher wrote: > > > > > > =EF=BB=BFI can narrow down the question further. In my live sample, o= ne of the > > > entries in the tree is for a directory named "dir155". > > > > > > If I run "dx_hash dir155", I get: > > > > > > # debugfs -R "dx_hash dir155" /var/lib/file.img > > > debugfs 1.46.2 (28-Feb-2021) > > > Hash of dir155 is 0x16279534 (minor 0x0) > > > > > > If I look in the tree with "htree_dump", I get: > > > > > > # debugfs -R "htree_dump /testdir" /var/lib/file.img > > > debugfs 1.46.2 (28-Feb-2021) > > > .... > > > Entry #0: Hash 0x00000000, block 1 > > > Reading directory block 1, phys 6459 > > > 168 0x00d11d98-b9b6b16b (16) dir155 332 0x009edafe-77de7d72 (16) di= r319 > > > > > > That hash for dir155 does not match what dx_hash gave. If I try to > > > take the code from fs/ext4/hash.c and build a small program to > > > calculate the hash, I get: > > > > > > $ ./md4 dir155 > > > MD4: d90278a1 25182ac7 a02e56be c3f30f04 > > > hash: 0x25182ac6 > > > minor: 0xa02e56be > > > > > > Clearly that isn't what is in the tree. What basic am I missing? > > > > One important factor is that the directory hash has an initial seed > > to prevent pathological cases where the user can construct thousands > > of directory entries that have a hash collision. > > > > Looking at the code explains this in the comment for __ext4fs_dirhash()= . > > The seed itself comes from sbi->s_hash_seed and is stored in the > > per-directory hinfo.seed to be used when counting the filename hash. > > In theory there could be a per-directory hash, but it appears to be a > > constant for the whole filesystem. > > > > Cheers, Andreas > > > > > > > >> On Fri, Oct 1, 2021 at 2:49 PM Avi Deitcher wrote= : > > >> > > >> Hi, > > >> > > >> I have been trying to understand the algorithm used for the "half-md= 4" > > >> in htree-structured directories. Going through the code (and trying > > >> not to get into reverse engineering), it looks like it is part of md= 4 > > >> but not entirely? Yet any subset I take doesn't quite line up with > > >> what I see in an actual sample. > > >> > > >> What is the algorithm it is using to turn an entry of, e.g., "file12= 5" > > >> into the appropriate hash. I did run a live sample, and try to get > > >> some form of correlation between the actual md4 hash (16 bytes) of t= he > > >> above to the actual entry (4 bytes) shown by debugfs, without much > > >> luck. > > >> > > >> What basic thing am I missing? > > >> > > >> Separately, how does the seed play into it? > > >> > > >> Thanks > > >> Avi > > > > > > > > > > > > -- > > > Avi Deitcher > > > avi@deitcher.net > > > Follow me http://twitter.com/avideitcher > > > Read me http://blog.atomicinc.com > > > > -- > Avi Deitcher > avi@deitcher.net > Follow me http://twitter.com/avideitcher > Read me http://blog.atomicinc.com --=20 Avi Deitcher avi@deitcher.net Follow me http://twitter.com/avideitcher Read me http://blog.atomicinc.com