Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp852218pxj; Thu, 27 May 2021 13:10:20 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyno3e6/CSLnHpfYKvyyYNCQJn5GKcy/VOcabynimP9URI1Gm6wWsThKYhgIDSV1PTRGylm X-Received: by 2002:a17:906:c7d2:: with SMTP id dc18mr5942071ejb.188.1622146219888; Thu, 27 May 2021 13:10:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1622146219; cv=none; d=google.com; s=arc-20160816; b=y0jW188/zEPjmT+axwuY86aXt4wTPW9YpG2vs5jcONtiIr1XWPmnL9n9447CQPcgVu VVqV2S4hvBrcUB1LbJFCB0HCiBVXSAoETOUu/d73GHi2lmoOGkh6lSZs5FszQ/Hqb0YL usU/Z1hWIVUnPVc8huVUBK6nyBbwvWtV1o0M7ICNdh95PYPWw251qQfWeSMbuw5uykk9 NhFSkcKY8tFaRIM6hpBGNEGohDLekgROm3/kNv54JERo9FhYzz2eGqOnTvsyphZ9CBKr TPYh+nfZnR/0/OsXMZkKZyDXQg0BXH0T9t8Rev4GPt3Ad2IQ54+Ejv+4vCRc+Rr1UFVf CUzg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:to:cc:in-reply-to:date:subject :mime-version:message-id:from:dkim-signature; bh=6VwlB2Tc0IkHPxnbnp1OXxNLyZThxA06dLXoz3MCtW0=; b=RZScAiRpg2yQ21jQ+wOPIh0oPylcQxpfXBrCnPUUZxTpUTjI6m0dN8wptceQR0LVLA 9ridmGFVll1bQ0N+bWRP6J7Eql2Tr0hu0y212UMtyJFY4hrt1L0rJw9c3Pi3pe4kyYI7 wYZPuzg4ap8CfrehpguszCiY51qY4/ihjFpeSQfcP+ZzTq5ElWTpbNplW7l1ao9MsMRb kU3UiFDU2e5DiAbk1pU6WV7dhNfaIG+VtcuTnbm7cTwSOxJlXyWaWgDZa0rte0sdxU40 MDoDO3+3MjL0vxqKSMMNu6T8DOUmSm/kh7Z3/xKDgYQSMdp9xdxOThv/6xXJJjtkW9XF tfaw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@dilger-ca.20150623.gappssmtp.com header.s=20150623 header.b="mXT59a/Q"; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id n15si3914674eje.4.2021.05.27.13.09.50; Thu, 27 May 2021 13:10:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@dilger-ca.20150623.gappssmtp.com header.s=20150623 header.b="mXT59a/Q"; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235519AbhE0ULB (ORCPT + 99 others); Thu, 27 May 2021 16:11:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41212 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235263AbhE0UK7 (ORCPT ); Thu, 27 May 2021 16:10:59 -0400 Received: from mail-pl1-x62b.google.com (mail-pl1-x62b.google.com [IPv6:2607:f8b0:4864:20::62b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 43992C061760 for ; Thu, 27 May 2021 13:09:25 -0700 (PDT) Received: by mail-pl1-x62b.google.com with SMTP id q16so512739pls.6 for ; Thu, 27 May 2021 13:09:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dilger-ca.20150623.gappssmtp.com; s=20150623; h=from:message-id:mime-version:subject:date:in-reply-to:cc:to :references; bh=6VwlB2Tc0IkHPxnbnp1OXxNLyZThxA06dLXoz3MCtW0=; b=mXT59a/QbPb2D53xTbnC9yvp3O0yneCpikemXt+PG5nNqzw4+qmPzHGtp7qltoRe1M WT7KNwqJ/BqNhR37hypnc+CRstB/Ae3GisA+otpuKy3KgWQ10FH+QBtnc91IauFPZljy ZaNXZ2Pa9v47MuNLfg1JQYLe4q82QsXwPgiw9Mz3maPuEILllxp2NSL6ZT8pphn/8bQy PHCBXAWX2Hinr4Xft1oOeW5M+i8imZRfIV11M9O3O/rfvKu+sErKttdt6ETXpC1BHhfp IdZlpdiZsGJ01R6wdjWA03MXJCM1z80UWBPxVJdxCKew93jt54lVzkGmtiPqnTjVouI3 agBQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:message-id:mime-version:subject:date :in-reply-to:cc:to:references; bh=6VwlB2Tc0IkHPxnbnp1OXxNLyZThxA06dLXoz3MCtW0=; b=bT5aHIyAJsCIpVQ44ePlq7Z+2LXK0ATNgaaEgolgmwPABVgC+1Br0eTIMz/TKfJnKq kVGazfS85PvEszXb0NXXlcoqvfSgyxkTGyUQ7ZasYc5eb0uDw1maAnw8MM0k65THfjeX 4pHrfK8Jdjry7nDI74oTMg+2rl21apqh++N/Z2gPkn4K3cWm3bubs39hxQ1LZzP9h8en +luj6u+Ae5vkurJfxSnG+5GwKYtPOO4kYGQQSeAmJYFQDCYAgK4MdYxT7Xyu8CHk+8Ih oSbnvs1IT9mrYue/NFwO5vcKRB9sT7ijKUEGvcNgukq9HarL+nw2wgKTJoHCGD97J/gY vE/w== X-Gm-Message-State: AOAM533NBjBSadtmuuVbc6vjYWlReJSw4v3tTzYhdu9ocmxIBTAS8kN/ NkIG7M5Ise7ra8IO2JPEJYiGrw== X-Received: by 2002:a17:90a:f18e:: with SMTP id bv14mr167249pjb.234.1622146164673; Thu, 27 May 2021 13:09:24 -0700 (PDT) Received: from cabot.adilger.int (S01061cabc081bf83.cg.shawcable.net. [70.77.221.9]) by smtp.gmail.com with ESMTPSA id h24sm2603937pfn.180.2021.05.27.13.09.23 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 27 May 2021 13:09:24 -0700 (PDT) From: Andreas Dilger Message-Id: Content-Type: multipart/signed; boundary="Apple-Mail=_D2F3CDB5-F4A7-4E22-BB6F-7D0F481045D5"; protocol="application/pgp-signature"; micalg=pgp-sha256 Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: [PATCH V2 4/7] ext4: add new helper interface ext4_insert_free_data Date: Thu, 27 May 2021 14:09:22 -0600 In-Reply-To: <83fab578-b170-c515-d514-1ed366f07e8a@gmail.com> Cc: Theodore Ts'o , Ext4 Developers List , lishujin@kuaishou.com, linux-fsdevel To: Wang Jianchao References: <164ffa3b-c4d5-6967-feba-b972995a6dfb@gmail.com> <49382052-6238-f1fb-40d1-b6b801b39ff7@gmail.com> <48e33dea-d15e-f211-0191-e01bd3eb17b3@gmail.com> <83fab578-b170-c515-d514-1ed366f07e8a@gmail.com> X-Mailer: Apple Mail (2.3273) Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org --Apple-Mail=_D2F3CDB5-F4A7-4E22-BB6F-7D0F481045D5 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii On May 26, 2021, at 2:43 AM, Wang Jianchao = wrote: >=20 > Split the codes that inserts and merges ext4_free_data structures > into a new interface ext4_insert_free_data. This is preparing for > following async background discard. Thank you for your patch series. I think this is an important area to improve, since the current "-o discard" option adds too much overhead to be really usable in practice. One problem with tracking the fine-grained freed extents and then using them directly to submit TRIM requests is that the underlying device may ignore TRIM requests that are too small. Submitting the TRIM right after each transaction commit does not allow much time for freed blocks to be aggregated (e.g. "rm -r" of a big directory tree), so it would be better to delay TRIM requests until more freed extents can be merged. Since most users only run fstrim once a day or every few days, it makes sense to allow time to merge freed space (tunable, maybe 5-15 minutes). However, tracking the rbtree for each group may be quite a lot of = overhead if this is kept in memory for minutes or hours, so minimizing the memory usage to track freed extents is also important. We discussed on the ext4 developer call today whether it is necessary to track the fine-grained free extents in memory, or if it would be better to only track min/max freed blocks within each group? Depending on the fragmentation of the free blocks in the group, it may be enough to just store a single bit in each group (as is done today), and only clear this when there are blocks freed in the group. Either way, the improvement would be that the kernel is scheduling groups to be trimmed, and submitting TRIM requests at a much larger = size, instead of depending on userspace to run fstrim. This also allows the fstrim scheduler to decide when the device is less busy and submit more TRIM requests, and back off when the device is busy. The other potential improvement is to track the TRIMMED state = persistently in the block groups, so that unmount/remount doesn't result in every = group being trimmed again. It would be good to refresh and include patches = from: "ext4: introduce EXT4_BG_WAS_TRIMMED to optimize trim" https://patchwork.ozlabs.org/project/linux-ext4/list/?series=3D184981 and e2fsprogs: add EXT2_FLAG_BG_WAS_TRIMMED to optimize fstrim https://patchwork.ozlabs.org/project/linux-ext4/list/?series=3D179639 along with this series. > Signed-off-by: Wang Jianchao > --- > fs/ext4/mballoc.c | 96 = +++++++++++++++++++++++++++++-------------------------- > 1 file changed, 51 insertions(+), 45 deletions(-) >=20 > diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c > index 85418cf..16f06d2 100644 > --- a/fs/ext4/mballoc.c > +++ b/fs/ext4/mballoc.c > @@ -350,6 +350,12 @@ static void ext4_mb_generate_from_pa(struct = super_block *sb, void *bitmap, > static void ext4_mb_generate_from_freelist(struct super_block *sb, = void *bitmap, > ext4_group_t group); > static void ext4_mb_new_preallocation(struct ext4_allocation_context = *ac); > +static inline struct ext4_free_data *efd_entry(struct rb_node *n) > +{ > + return rb_entry_safe(n, struct ext4_free_data, efd_node); > +} > +static int ext4_insert_free_data(struct ext4_sb_info *sbi, > + struct rb_root *root, struct ext4_free_data *nfd); >=20 > /* > * The algorithm using this percpu seq counter goes below: > @@ -5069,28 +5075,53 @@ static void ext4_try_merge_freed_extent(struct = ext4_sb_info *sbi, > kmem_cache_free(ext4_free_data_cachep, entry); > } >=20 > +static int ext4_insert_free_data(struct ext4_sb_info *sbi, > + struct rb_root *root, struct ext4_free_data *nfd) > +{ > + struct rb_node **n =3D &root->rb_node; > + struct rb_node *p =3D NULL; > + struct ext4_free_data *fd; > + > + while (*n) { > + p =3D *n; > + fd =3D rb_entry(p, struct ext4_free_data, efd_node); > + if (nfd->efd_start_cluster < fd->efd_start_cluster) > + n =3D &(*n)->rb_left; > + else if (nfd->efd_start_cluster >=3D > + (fd->efd_start_cluster + fd->efd_count)) > + n =3D &(*n)->rb_right; > + else > + return -EINVAL; > + } > + > + rb_link_node(&nfd->efd_node, p, n); > + rb_insert_color(&nfd->efd_node, root); > + > + /* Now try to see the extent can be merged to left and right */ > + fd =3D efd_entry(rb_prev(&nfd->efd_node)); > + if (fd) > + ext4_try_merge_freed_extent(sbi, fd, nfd, root); > + > + fd =3D efd_entry(rb_next(&nfd->efd_node)); > + if (fd) > + ext4_try_merge_freed_extent(sbi, fd, nfd, root); > + > + return 0; > +} > + > static noinline_for_stack int > ext4_mb_free_metadata(handle_t *handle, struct ext4_buddy *e4b, > - struct ext4_free_data *new_entry) > + struct ext4_free_data *nfd) > { > - ext4_group_t group =3D e4b->bd_group; > - ext4_grpblk_t cluster; > - ext4_grpblk_t clusters =3D new_entry->efd_count; > - struct ext4_free_data *entry; > struct ext4_group_info *db =3D e4b->bd_info; > struct super_block *sb =3D e4b->bd_sb; > struct ext4_sb_info *sbi =3D EXT4_SB(sb); > - struct rb_node **n =3D &db->bb_free_root.rb_node, *node; > - struct rb_node *parent =3D NULL, *new_node; >=20 > BUG_ON(!ext4_handle_valid(handle)); > BUG_ON(e4b->bd_bitmap_page =3D=3D NULL); > BUG_ON(e4b->bd_buddy_page =3D=3D NULL); >=20 > - new_node =3D &new_entry->efd_node; > - cluster =3D new_entry->efd_start_cluster; > - > - if (!*n) { > + if (!db->bb_free_root.rb_node) { > /* first free block exent. We need to > protect buddy cache from being freed, > * otherwise we'll refresh it from > @@ -5099,44 +5130,19 @@ static void ext4_try_merge_freed_extent(struct = ext4_sb_info *sbi, > get_page(e4b->bd_buddy_page); > get_page(e4b->bd_bitmap_page); > } > - while (*n) { > - parent =3D *n; > - entry =3D rb_entry(parent, struct ext4_free_data, = efd_node); > - if (cluster < entry->efd_start_cluster) > - n =3D &(*n)->rb_left; > - else if (cluster >=3D (entry->efd_start_cluster + = entry->efd_count)) > - n =3D &(*n)->rb_right; > - else { > - ext4_grp_locked_error(sb, group, 0, > - ext4_group_first_block_no(sb, group) + > - EXT4_C2B(sbi, cluster), > - "Block already on to-be-freed list"); > - kmem_cache_free(ext4_free_data_cachep, = new_entry); > - return 0; > - } > - } > - > - rb_link_node(new_node, parent, n); > - rb_insert_color(new_node, &db->bb_free_root); > - > - /* Now try to see the extent can be merged to left and right */ > - node =3D rb_prev(new_node); > - if (node) { > - entry =3D rb_entry(node, struct ext4_free_data, = efd_node); > - ext4_try_merge_freed_extent(sbi, entry, new_entry, > - &(db->bb_free_root)); > - } >=20 > - node =3D rb_next(new_node); > - if (node) { > - entry =3D rb_entry(node, struct ext4_free_data, = efd_node); > - ext4_try_merge_freed_extent(sbi, entry, new_entry, > - &(db->bb_free_root)); > + if (ext4_insert_free_data(sbi, &db->bb_free_root, nfd)) { > + ext4_grp_locked_error(sb, e4b->bd_group, 0, > + ext4_group_first_block_no(sb, = e4b->bd_group) + > + EXT4_C2B(sbi, nfd->efd_start_cluster), > + "Block already on to-be-freed list"); > + kmem_cache_free(ext4_free_data_cachep, nfd); > + return 0; > } >=20 > spin_lock(&sbi->s_md_lock); > - list_add_tail(&new_entry->efd_list, &sbi->s_freed_data_list); > - sbi->s_mb_free_pending +=3D clusters; > + list_add_tail(&nfd->efd_list, &sbi->s_freed_data_list); > + sbi->s_mb_free_pending +=3D nfd->efd_count; > spin_unlock(&sbi->s_md_lock); > return 0; > } > -- > 1.8.3.1 >=20 Cheers, Andreas --Apple-Mail=_D2F3CDB5-F4A7-4E22-BB6F-7D0F481045D5 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP -----BEGIN PGP SIGNATURE----- Comment: GPGTools - http://gpgtools.org iQIzBAEBCAAdFiEEDb73u6ZejP5ZMprvcqXauRfMH+AFAmCv/HIACgkQcqXauRfM H+DYBw/+LHeLPSpVAXJgmQR1LcGoY28KMcylMR0WsQhdSxyD835gH6QLsyiYIfvV tEA3e1aRMTLJ2YL+yFajGcnGeWeMyCbZ3OGLPYVZZq9ssFsdDwqJZHl1Ws09hmve az4k+fIhYPESdN89rwaCQYCkID6T7DXUMHMrmd+QVs8EVyBxQ+Peqk2KTq34aSu/ benclflejtdh1HSXmm+oQt0kLfpQty9Y31ELBdkIyU874w76JyhCL/ttyW4h2jrj vIOUk9PzLF0BP3AbgWNI1U9a8E5HnreF2jnmXJytQaMZx/cKYVVuKxY1AGGxI24B rSDR35YLdm3Y19XyhBJHG8bmzV30K/bOS5X8xFFumxoOrGIVqVBXqvHTH0Z4Ip2O iSi8OEscKKTHspnEDQBgc9w4Lv+UBUWQT/mI8IvSrLvHIal2amRDE6VGRPc9f+sx doquP5eRLAvqNOwo/b1o/w1Eyy2XxOhBTxB1oNk6lGlmbh/KIeswahcodgpykT50 KJIIKHiZXrbsWz65HEYOI7ijpXkCEZD+iSOusteWMcipy46b0/ByYzGHqf0GtXna 72cADLcUk/rnNijN7uIz3Ci8vcw3c6lhQbJi2Dq34MxIOX5sWVIqlPsQWs/47IKH cGVDkcnga8Lazy+Z1L0P8dJoXI0Ux+u/YbFP16ORgyNKsn8cOik= =vIZ2 -----END PGP SIGNATURE----- --Apple-Mail=_D2F3CDB5-F4A7-4E22-BB6F-7D0F481045D5--