Received: by 2002:a25:7ec1:0:0:0:0:0 with SMTP id z184csp6262614ybc; Wed, 27 Nov 2019 18:31:29 -0800 (PST) X-Google-Smtp-Source: APXvYqylBhv/AY6ckH1sT6z+YO3K2GcWb8/sJdRrD0pZWR/VAlGpLqzLXQqniMm8B2PHxOicIrmE X-Received: by 2002:a05:6402:883:: with SMTP id e3mr35963408edy.32.1574908289015; Wed, 27 Nov 2019 18:31:29 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1574908289; cv=none; d=google.com; s=arc-20160816; b=nHvM7PXkYE+9UPdZz6iWsSE+2cYnKzpYuoPA0WTBWFRfbEyQ9557VXHIbAKKrwTitJ jR2QDgrRjjyb+lP5izMbnLw4YeJaW9QQt7uegfbaqW3cOXyEb8CipTmdvrW2vVqZVjzZ YtFdL0yJX1httTicUSJVm0q38DEURjXv68hDK4RL2b6Aqs5VKQZp/2ueYfereqc1JGiu 7pSMHm8MDHbYdrvTRrF5ltvQDLVnpHQGt4w3ITXwn9V6A2JJPOoXNCLqqO2nFg4IhHjP yp72VWwHw0R6qTDshZc0oLqgldhEItSruMUB4HFCWQGxnbaQgJk9Cm+zEEMFB8DZ7lpN cFyg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=ZodBVlch5MDE1BzTgUtWuiefPju/BpwHbG98iIm4/JM=; b=TyQyAs6X1swA4CwY+rIZP9TruP/KWBVP/2PVlxXTc8n9ZiuCHotEqnfL5knVxh+Khu ERRJmpzV9jrBz7TYfI20w26ipuor1Hmnb3ltf+Y8vcnCrWeMaiJnMhhKUaHmBJMizG/x +/JznsENqvtFZFVRLGSXhEOomLpT9Hyca9EKq4MpcTp+mxhBhseMDZI0/qOlpWJZ01Fu fomH/8JqRapyqDV4iYFUOvgxdb9xRFg2r+PpVVwOah6wzVTohBLrDVQBgEcQcUZLsCBB EGd4do19lG+hLIxAM21CL1ZWWPiVURe3bgwN4eR7G6EdhgynjABBP7Tnav9vo/kkyKHb njQA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id cc4si12130324edb.237.2019.11.27.18.30.54; Wed, 27 Nov 2019 18:31:29 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727704AbfK1C22 (ORCPT + 99 others); Wed, 27 Nov 2019 21:28:28 -0500 Received: from outgoing-auth-1.mit.edu ([18.9.28.11]:47089 "EHLO outgoing.mit.edu" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727432AbfK1C22 (ORCPT ); Wed, 27 Nov 2019 21:28:28 -0500 Received: from callcc.thunk.org (97-71-153.205.biz.bhn.net [97.71.153.205] (may be forged)) (authenticated bits=0) (User authenticated as tytso@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id xAS2SHsO003384 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 27 Nov 2019 21:28:18 -0500 Received: by callcc.thunk.org (Postfix, from userid 15806) id 1C4934202FD; Wed, 27 Nov 2019 21:28:17 -0500 (EST) Date: Wed, 27 Nov 2019 21:28:17 -0500 From: "Theodore Y. Ts'o" To: Daniel Phillips Cc: linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, OGAWA Hirofumi Subject: Re: [RFC] Thing 1: Shardmap fox Ext4 Message-ID: <20191128022817.GE22921@mit.edu> References: <176a1773-f5ea-e686-ec7b-5f0a46c6f731@phunq.net> <20191127142508.GB5143@mit.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.12.2 (2019-09-21) Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Wed, Nov 27, 2019 at 02:27:27PM -0800, Daniel Phillips wrote: > > (2) It's implemented as userspace code (e.g., it uses open(2), > > mmap(2), et. al) and using C++, so it would need to be reimplemented > > from scratch for use in the kernel. > > Right. Some of these details, like open, are obviously trivial, others > less so. Reimplementing from scratch is an overstatement because the > actual intrusions of user space code are just a small portion of the code > and nearly all abstracted behind APIs that can be implemented as needed > for userspace or kernel in out of line helpers, so that the main source > is strictly unaware of the difference. The use of C++ with templates is presumably one of the "less so" parts, and it was that which I had in mind when I said, "reimplementing from scratch". > Also, most of this work is already being done for Tux3, Great, when that work is done, we can take a look at the code and see.... > > (5) The claim is made that readdir() accesses files sequentially; but > > there is also mention in Shardmap of compressing shards (e.g., > > rewriting them) to squeeze out deleted and tombstone entries. This > > pretty much guarantees that it will not be possible to satisfy POSIX > > requirements of telldir(2)/seekdir(3) (using a 32-bit or 64-bitt > > cookie), NFS (which also requires use of a 32-bit or 64-bit cookie > > while doing readdir scan), or readdir() semantics in the face of > > directory entries getting inserted or removed from the directory. > > No problem, the data blocks are completely separate from the index so > readdir just walks through them in linear order a la classic UFS/Ext2. > What could possibly be simpler, faster or more POSIX compliant? OK, so what you're saying then is for every single directory entry addition or removal, there must be (at least) two blocks which must be modified, an (at least one) index block, and a data block, no? That makes it worse than htree, where most of the time we only need to modify a single leaf node. We only have to touch an index block when a leaf node gets full and it needs to be split. Anyway, let's wait and see how you and Hirofumi-san work out those details for Tux3, and we can look at that and consider next steps at that time. Cheers, - Ted