Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp4743366pxj; Tue, 25 May 2021 15:34:09 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz9vPgwzrjkddInxiXR1xaXyWCQcROTVZxNUICzbbTgi/candQchJmkDOo5SpkhyO1lgfTj X-Received: by 2002:a6b:e90a:: with SMTP id u10mr23683791iof.9.1621982049046; Tue, 25 May 2021 15:34:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1621982049; cv=none; d=google.com; s=arc-20160816; b=MjvkWdWwAqxqbPcFfA9w6IsFQ5Jc8NGmsvcyKgs7feJoGihkrNP4hW82S/bmN+TtR5 SP6gAREjgTEcOiXgz8IpN4PJiG/lbYf9XQbjQAqejSBRiHPsWJqxywbJMDxrHE80iDgx CHtQIHgYCecKMHWnVoGD9InnvuvVxZ/xFSdVeXkl9aqZPwa+2gVOHdJ72K9rsIkpQ5EC kD14z582L3BX1CcI0zfO+Jp6zpuKSCh3GVYQGYLBpkqVDwQL1S5LswOYnn9UFpg+IhFD 97NlZuFu/nIRKwdPqTQzsT2uhCI6AcfALC1PGZojlYJXd7nEhoInLA+hfWDUDEsII8H4 6/4A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:to:cc:in-reply-to:date:subject :mime-version:message-id:from:dkim-signature; bh=JmM52i5DNThmfft1Cd0isHBrymiQJRyWSlR6uoOrusw=; b=RfIDy0rYGJrE//7S5zz49+Ht9z+YzpvJhY0Ik3XBCQcGuV5+peFrkI3NUoUAX99SCp t6Jk1e94AnT7+dPgrBeb37RkHHKlqFw7G4ijCSrx1c6yPSxc0R5EvAaSv2HUJZCrafE2 +CCcpXmtecz8YMQp6RpOSfrpG2yjPJ5f3HDTqScd04zXQZ92i3gS6XzaUzcd6hMiENAs NXtvs/5oE4XOcMbxjT+mRLzizsQLZDKKQcbtCxF3E4eLjqXioY9crHuTWglRslX7kxIk 9fUpPL1MPjXMtTt0klAgIqtPqxyXnAWe5vtu8pvCNiAqi3UC3Cb4yw7luKUVElo8L+mM kzRg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@dilger-ca.20150623.gappssmtp.com header.s=20150623 header.b=SOVprFe0; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id p8si17425826ils.114.2021.05.25.15.33.56; Tue, 25 May 2021 15:34:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@dilger-ca.20150623.gappssmtp.com header.s=20150623 header.b=SOVprFe0; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231785AbhEYVPP (ORCPT + 99 others); Tue, 25 May 2021 17:15:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54264 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231465AbhEYVPP (ORCPT ); Tue, 25 May 2021 17:15:15 -0400 Received: from mail-pj1-x1029.google.com (mail-pj1-x1029.google.com [IPv6:2607:f8b0:4864:20::1029]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 38E2BC061574 for ; Tue, 25 May 2021 14:13:45 -0700 (PDT) Received: by mail-pj1-x1029.google.com with SMTP id kr9so9528495pjb.5 for ; Tue, 25 May 2021 14:13:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dilger-ca.20150623.gappssmtp.com; s=20150623; h=from:message-id:mime-version:subject:date:in-reply-to:cc:to :references; bh=JmM52i5DNThmfft1Cd0isHBrymiQJRyWSlR6uoOrusw=; b=SOVprFe0BzMa6/xG5BUaiLbSHmR6DfG13wobQJj5aVigGaL1CrXgB4LM2PB61Wc5MK hd1L83F5ploc2veZiGIHQkg7oZs+iYFtGvw/e46v62eESQjsWhYCRsZWqul3culutsus bm6FTuGfP0hTjoepFHswHJorzJm2OTx3c4yt7ykAHtkCaPs+b5yBeFFMfjChfLfF+szw w/6I0E7Q8axsSV0BBUWvBOAmkTeyG0lZ4TMcnkzmSocXd248/QfKJRwYu6An1y9f5jgy zrTCpJedVaZLOzM8cQ1UIS/k58G8nKVgPvFwGs02/krSIH3xXjdTrihwHYvguZ+0QD81 giyQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:message-id:mime-version:subject:date :in-reply-to:cc:to:references; bh=JmM52i5DNThmfft1Cd0isHBrymiQJRyWSlR6uoOrusw=; b=mzvM5jqkFTr7r3MOhpQ89mWkPhWMMcZtZ0YZKbVA+oJi+l9o0LGTfLMqN84mSHPoyX T0NXer2Q95peguI7FnLqFwXfcybwCi+GLcghs+/K/VXvTyP2SFtObhhnKBn+jtWmUueL 0RhFqgeFWyv9k+b0rexS8lrwC//K745ckiYDBT7tLPJdhj+GDDpln714gBPOv3+rpP/W 2IgpOWAcSErNXLHUfLrXX2BSLFE2UaP81rb/nG2VDhD+kohR0bN+ykdM8B7i2eM3zEEn 7UBHbnwnT2DbGIu9tQZUJSwWAuXB96fJQC2RsYP6mrq8GerNVvjK8QOvrPMxOXfkh8yu P+EQ== X-Gm-Message-State: AOAM5319910fXhqyMDaUFnyzBohoK+hBUqKpRsPWW8LGo1F9+Bq73r+o Mf0yS7fn88UHilut5O9daE8MEA== X-Received: by 2002:a17:90a:390a:: with SMTP id y10mr32412407pjb.9.1621977224576; Tue, 25 May 2021 14:13:44 -0700 (PDT) Received: from cabot.adilger.int (S01061cabc081bf83.cg.shawcable.net. [70.77.221.9]) by smtp.gmail.com with ESMTPSA id gb10sm13084005pjb.57.2021.05.25.14.13.43 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 25 May 2021 14:13:43 -0700 (PDT) From: Andreas Dilger Message-Id: Content-Type: multipart/signed; boundary="Apple-Mail=_794BFB72-2A90-4EC6-873C-64F5CD31A56E"; protocol="application/pgp-signature"; micalg=pgp-sha256 Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: How capacious and well-indexed are ext4, xfs and btrfs directories? Date: Tue, 25 May 2021 15:13:52 -0600 In-Reply-To: Cc: David Howells , Theodore Ts'o , "Darrick J. Wong" , Chris Mason , Ext4 Developers List , xfs , linux-btrfs , linux-cachefs@redhat.com, linux-fsdevel , NeilBrown To: Josh Triplett References: <206078.1621264018@warthog.procyon.org.uk> <6E4DE257-4220-4B5B-B3D0-B67C7BC69BB5@dilger.ca> X-Mailer: Apple Mail (2.3273) Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org --Apple-Mail=_794BFB72-2A90-4EC6-873C-64F5CD31A56E Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii On May 22, 2021, at 11:51 PM, Josh Triplett = wrote: >=20 > On Thu, May 20, 2021 at 11:13:28PM -0600, Andreas Dilger wrote: >> On May 17, 2021, at 9:06 AM, David Howells = wrote: >>> With filesystems like ext4, xfs and btrfs, what are the limits on = directory >>> capacity, and how well are they indexed? >>>=20 >>> The reason I ask is that inside of cachefiles, I insert fanout = directories >>> inside index directories to divide up the space for ext2 to cope = with the >>> limits on directory sizes and that it did linear searches (IIRC). >>>=20 >>> For some applications, I need to be able to cache over 1M entries = (render >>> farm) and even a kernel tree has over 100k. >>>=20 >>> What I'd like to do is remove the fanout directories, so that for = each logical >>> "volume"[*] I have a single directory with all the files in it. But = that >>> means sticking massive amounts of entries into a single directory = and hoping >>> it (a) isn't too slow and (b) doesn't hit the capacity limit. >>=20 >> Ext4 can comfortably handle ~12M entries in a single directory, if = the >> filenames are not too long (e.g. 32 bytes or so). With the = "large_dir" >> feature (since 4.13, but not enabled by default) a single directory = can >> hold around 4B entries, basically all the inodes of a filesystem. >=20 > ext4 definitely seems to be able to handle it. I've seen bottlenecks = in > other parts of the storage stack, though. >=20 > With a normal NVMe drive, a dm-crypt volume containing ext4, and = discard > enabled (on both ext4 and dm-crypt), I've seen rm -r of a directory = with > a few million entries (each pointing to a ~4-8k file) take the better > part of an hour, almost all of it system time in iowait. Also makes = any > other concurrent disk writes hang, even a simple "touch x". Turning = off > discard speeds it up by several orders of magnitude. >=20 > (I don't know if this is a known issue or not, so here are the details > just in case it isn't. Also, if this is already fixed in a newer = kernel, > my apologies for the outdated report.) Definitely "-o discard" is known to have a measurable performance = impact, simply because it ends up sending a lot more requests to the block = device, and those requests can be slow/block the queue, depending on underlying storage behavior. There was a patch pushed recently that targets "-o discard" performance: https://patchwork.ozlabs.org/project/linux-ext4/list/?series=3D244091 that needs a bit more work, but may be worthwhile to test if it improves your workload, and help put some weight behind landing it? Another proposal was made to change "-o discard" from "track every freed block and submit TRIM" to "(persistently) track modified block groups = and submit background TRIM like fstrim for the whole group". One advantage of tracking the whole block group is that block group state is already maintained in the kernel and persistently on disk. This also provides a middle way between "immediate TRIM" that may not cover a whole erase = block when it is run, and "very lazy fstrim" that aggregates all free blocks = in a group but only happens when fstrim is run (from occasionally to = never). The in-kernel discard+fstrim handling could be smarter than "run every = day from cron" because it can know when the filesystem is busy or not, how = much data has been written and freed, and when a block group has a = significant amount of free space and is useful to actually submit the TRIM for a = group. The start of that work was posted for discussion on linux-ext4: https://marc.info/?l=3Dlinux-ext4&m=3D159283169109297&w=3D4 but ended up focussed on semantics of whether TRIM needs to obey = requested boundaries for security reasons, or not. Cheers, Andreas --Apple-Mail=_794BFB72-2A90-4EC6-873C-64F5CD31A56E Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP -----BEGIN PGP SIGNATURE----- Comment: GPGTools - http://gpgtools.org iQIzBAEBCAAdFiEEDb73u6ZejP5ZMprvcqXauRfMH+AFAmCtaJIACgkQcqXauRfM H+B43g//XXljpnvCDqmJdZoi3BWDl8BXab4KwkTguFmad56XaZQHFlCLjW0AhBHE zsBcuVk/C2ZsO4lwRUL02JtXD2fWB/VQ2lTFQ7dL0RPyL5QIvt0ineeUjm7Ik/nC bqu1GSoTyNCKV37S25mnsYfM9+pxuHongQu1q5cXzdEzqi6Lk2Wpe0o6ktw0M0us 08YX+B4g2aGgk1zjlnpiTBCjlbSpst91AhoLmjfdL+oDIHqG17HV0gonZsy+84W4 kHqb/IPAiDQJ+FCHGIbpRoMlXVYB6G265m+e2vECMi1+wiXxBLIJsxvjYk3vd1k/ ZHtY67f5UNQAqU/TeYhlTpNdfwUs0nYb85oYGMR/db1kDQj1vCh/OS6SZKjQj/fl a5cjREGb8ts+JYvVTYLQYbMtsBMtFSimss6HRl9SrI5N18zMGT9ffdSjciZaTdaM 51gtZd06Vs9cR3K91xaJqa6NVo/BrTrFqNmZP0ccxPa9kRzKwzfQbvF9s2wF75va 9rx5ouzLvrbZDqGVM5VjYumJtptvLAigoCFa3F1R/ebdwer3Rbn6GMLFbttZ98zd vvoGxgQVmBAYMt6SqYpZ2nr8gygFTr0guzN+xRp6ynlHusVOkbfSNrR8x0k7ZA96 kqU0BeziMpUL5r7OL6casG/bSRUxgKgdHi9sP+jmAK2yu6S+aG8= =P+t3 -----END PGP SIGNATURE----- --Apple-Mail=_794BFB72-2A90-4EC6-873C-64F5CD31A56E--