Received: by 2002:a25:7ec1:0:0:0:0:0 with SMTP id z184csp9631549ybc; Sat, 30 Nov 2019 11:03:41 -0800 (PST) X-Google-Smtp-Source: APXvYqxUGx9b+ttU4Lb1L13396cRy8NCxUlKxVhWwx+EamyIUTBIGmHaJmM5i/o1s4ncekitcgfe X-Received: by 2002:a05:600c:290d:: with SMTP id i13mr20579723wmd.139.1575140621129; Sat, 30 Nov 2019 11:03:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1575140621; cv=none; d=google.com; s=arc-20160816; b=Ril83y637MZtw0mUrzGda8NuSI5VlseREG27yfEbCLVNoK9lxTYcPp+eMEhiEimecr xOyW8qHGCfxtHyefQJ1ghccEFyrorrGNO30sY711znwdXP3R/3wTY9ShTfocVGzbAOx9 V7yyrPc/bTlVmmktgwWZdaymeQx2EMli7vJaXA3asxH1gTQJYsH8GqzsGsTrE2NUzcPa SF1yKZxC9ER0Sy1TKRMg9pDTdFLD+EqaezYQRO3kmjABCURYq6fnKruheoAtUOPPg0Xp Fkn9BdeZ/o46WTkmh8+CwJljZDzXhWvQPc36+Yizk7ep25qvhn6Kg0zoQouHVFzrLjBp mBWg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=GGq7NG4PjhhHnk2wexql8rnEsANyaTqYb4OJea58Qhw=; b=mBSLVitVOaLcE89ge8wwjp1tkYf2i4GF35A/6IIMCPSB2JCO9QN1BbyzLhHVEIrlVJ wvHgWdAEqgB7jaZQ75fIuLKNBlj5WVTeBGMPdwD0en7Yl7NjnD9ZzoiIVu7MkihhcTG8 +d89t/qw3zRd6kTGMx6mfj9oa8/BAai6ZnwaobY7sgwyzvDN0OfwviMMPfhKhn5zczhQ H/7qTOB6DzBcxgu6h88dWcOx9zzNe4xqWMEuu+DbozmwZIR1rBJQfb79/AOwMywhXGnh tN72zsuwwwXFswY/OlZLElICBgOcJOMe2l134W/oCb9gY/9BuGcrWXMVih4e0mcp89Y3 B4og== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g8si11528933eds.278.2019.11.30.11.03.09; Sat, 30 Nov 2019 11:03:41 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727108AbfK3TDG (ORCPT + 99 others); Sat, 30 Nov 2019 14:03:06 -0500 Received: from outgoing-auth-1.mit.edu ([18.9.28.11]:53969 "EHLO outgoing.mit.edu" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727025AbfK3TDF (ORCPT ); Sat, 30 Nov 2019 14:03:05 -0500 Received: from callcc.thunk.org (ip-64-134-102-67.public.wayport.net [64.134.102.67]) (authenticated bits=0) (User authenticated as tytso@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id xAUJ2tch026831 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sat, 30 Nov 2019 14:02:56 -0500 Received: by callcc.thunk.org (Postfix, from userid 15806) id 07A86421A48; Sat, 30 Nov 2019 12:50:47 -0500 (EST) Date: Sat, 30 Nov 2019 12:50:46 -0500 From: "Theodore Y. Ts'o" To: Daniel Phillips Cc: linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, OGAWA Hirofumi Subject: Re: [RFC] Thing 1: Shardmap fox Ext4 Message-ID: <20191130175046.GA6655@mit.edu> References: <176a1773-f5ea-e686-ec7b-5f0a46c6f731@phunq.net> <20191127142508.GB5143@mit.edu> <20191128022817.GE22921@mit.edu> <3b5f28e5-2b88-47bb-1b32-5c2fed989f0b@phunq.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3b5f28e5-2b88-47bb-1b32-5c2fed989f0b@phunq.net> User-Agent: Mutt/1.12.2 (2019-09-21) Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Wed, Nov 27, 2019 at 08:27:59PM -0800, Daniel Phillips wrote: > You are right that Shardmap also must update the shard fifo tail block, > however there is only one index shard up to 64K entries, so all the new > index entries go into the same tail block(s). So how big is an index shard? If it is 64k entries, and each entry is 16 bytes (8 bytes hash, 8 bytes block number), then a shard is a megabyte, right? Are entries in an index shard stored in sorted or unsorted manner? If they are stored in an unsorted manner, then when trying to do a lookup, you need to search all of the index shard --- which means for a directory that is being frequently accessed, the entire index shard has to be kept in memory, no? (Or paged in as necessary, if you are using mmap in userspace). > Important example: how is atomic directory commit going to work for > Ext4? The same way all metadata updates work in ext4. Which is to say, you need to declare the maximum number of 4k metadata blocks that an operation might need to change when calling ext4_journal_start() to create a handle; and before modifying a 4k block, you need to call ext4_journal_get_write_access(), passing in the handle and the block's buffer_head. After modifying the block, you must call ext4_handle_dirty_metadata() on the buffer_head. And when you are doing with the changes in an atomic metadata operation, you call ext4_journal_stop() on the handle. This hasn't changed since the days of ext3 and htree. - Ted