Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp1390989ybl; Fri, 6 Dec 2019 16:46:28 -0800 (PST) X-Google-Smtp-Source: APXvYqz4TCGDyna+KsWKTvCJ4wM8nIP+iI6lbZpiNL0LwbyZgcGbqZIpbCspKpGqEoY7SdVbB2ep X-Received: by 2002:a05:6808:ab1:: with SMTP id r17mr14336552oij.141.1575679588741; Fri, 06 Dec 2019 16:46:28 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1575679588; cv=none; d=google.com; s=arc-20160816; b=bpW5R46s/8Vyd5KPF7zZlNl6sHZRf3jSLFTPwbdG812gzkijB9sXQKU8SMU26GRxsS SrZ6Z+5WlUdRZr3so9umvC3roAdr2JHDVN2ZduxDr01gUc5d6RlG8+t3nCQkJxcmIXLc QZ25EQklGnMb9J/ZPXBk3X5FFpJwgFmINW9BPL/bzmb8bMyjVe5FCI2BFbnM9DUw0vNx /5muF4tomeTH3viRIn0hiaqmHSkgQJvjFMlHl2q6oQXAVYHKkzDYOowa78dmteQEEjCC VnmRljSoa7vKjvEtJyxEaEE88S52VaO5vf5/T2gy8ZeQfrfRl5jLjfvRtRzg0tnSxVal 00ZA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=x4DWoGUy7+Tvie29+T9NhterfUI8+gMH9CPXEo0V4pE=; b=KoMDrgSlMla2KnXa74CTlCLXPD/GLkTXG6HS48d+fs4vny19MnowQ9pBGO73sDvruh Otiq5dffLihoAeVFzs6zQJ7RnLBacsnEPMQaa9roM87mKqjm+mIFWRailbAVqXIxjmaX HPs9nwriqsiyLkVuyLWUv8h2ejicWXqSkB7/UCO1GY3ZPR+c2j7gIZbrRNmF3qVPrSNx P5AXHNyOHcK5C08dtn8LA9jZG6On7LQuH9lCq2rCoYYliJc/vBvtGR+KLjDVXN8BtLKW L1DIZ4xZLja1BW7Wbe8Q77WUF9nlCuRDbWbrn+3+Ib35AmBdXwyrUgzFrH4yraRmuUgZ RDVQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p14si8558611ota.71.2019.12.06.16.46.10; Fri, 06 Dec 2019 16:46:28 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726386AbfLGAqI (ORCPT + 99 others); Fri, 6 Dec 2019 19:46:08 -0500 Received: from mail.phunq.net ([66.183.183.73]:52232 "EHLO phunq.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726388AbfLGAqI (ORCPT ); Fri, 6 Dec 2019 19:46:08 -0500 Received: from [172.16.1.14] by phunq.net with esmtpsa (TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_128_GCM:128) (Exim 4.92.3) (envelope-from ) id 1idOEX-0006U5-SC; Fri, 06 Dec 2019 16:46:05 -0800 Subject: Re: [RFC] Thing 1: Shardmap for Ext4 To: Vyacheslav Dubeyko , "Theodore Y. Ts'o" Cc: linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, OGAWA Hirofumi References: <176a1773-f5ea-e686-ec7b-5f0a46c6f731@phunq.net> <20191127142508.GB5143@mit.edu> <6b6242d9-f88b-824d-afe9-d42382a93b34@phunq.net> <9ed62cfea37bfebfb76e378d482bd521c7403c1f.camel@dubeyko.com> <37c9494c40998d23d0d68afaa5a7f942a23e8986.camel@dubeyko.com> From: Daniel Phillips Message-ID: Date: Fri, 6 Dec 2019 16:46:05 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 In-Reply-To: <37c9494c40998d23d0d68afaa5a7f942a23e8986.camel@dubeyko.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On 2019-12-06 3:47 a.m., Vyacheslav Dubeyko wrote: > On Thu, 2019-12-05 at 01:46 -0800, Daniel Phillips wrote: >> On 2019-12-04 7:55 a.m., Vyacheslav Dubeyko wrote: >>>> > > > >> And here is a diagram of the Shardmap three level hashing scheme, >> which ties everything together: >> >> https://github.com/danielbot/Shardmap/wiki/Shardmap-hashing-scheme >> >> This needs explanation. It is something new that you won't find in >> any >> textbook, this is the big reveal right here. >> > > This diagram is pretty good and provides the high-level view of the > whole scheme. But, maybe, it makes sense to show the granularity of > hash code. It looks like the low hash is the hash of a name. Am I > correct? Not quite. A 64 bit hash code is computed per name, then divided up into three parts as shown in the diagram. Each part of the hash addresses a different level of the Shardmap index hierarchy: high bits address the top level shard array, giving a pointer to a shard; middle bits address a hash bucket within that shard; low bits are used to resolve collisions within the hash bucket (and collisions still may occur even when the low bits are considered, forcing a record block access and full string compare. > But how the mid- and high- parts of the hash code are defined? Given the above description, does the diagram make sense? If so I will add this description to the wiki. > It looks like that cached shard stores LBAs of record entry blocks are > associated with the low hash values. Rather, associated with the entire hash value. > But what does it mean that shard is cached? This is the cache form of the shard, meaning that the unordered hash/lba index pairs (duopack) were read in from media and loaded into this cache object (or newly created by recent directory operations.) > Here is a diagram of the cache structures, very simple: >> >> https://github.com/danielbot/Shardmap/wiki/Shardmap-cache-format > > This diagram is not easy to relate with the previous one. So, shard > table and shard array are the same entities or not? They are, and I have updated the hashing scheme diagram to refer to both as "array". I will similarly update the code, which currently calls the shard array field "map". > Or do you mean that > shard table is storeed on the volume but shard array is constructed in > memory? Sorry about that, it should be clear now. On the volume, a simple unordered collection of hash:lba pairs is stored per shard, which is reorganized into shard cache form (a hash table object) at demand-load time. >> There is a diagram here: >> > https://github.com/danielbot/Shardmap/wiki/Shardmap-record-block-format > > I am slightly confused here. Does header be located at the bottom of > the record block? The header (just 32 bytes at the moment, possibly to be expanded to 48 or 64) is stored at the top of the zeroth record entry block, which is therefore a little smaller than any other record entry block. > My understanding is that records grow from top of the > block down to the header direction. Am I correct? Why header is not > located at the top of the block with entry dictionary? Any special > purpose here? That should be clear now. I will add the above descriptive text to the wiki. Regards, Daniel