Received: by 2002:a25:6193:0:0:0:0:0 with SMTP id v141csp4073053ybb; Mon, 6 Apr 2020 23:39:50 -0700 (PDT) X-Google-Smtp-Source: APiQypIBaWEsAUbitD8/5OIWHmSHBtfn4iL4nNw0u29CrnvvPYZV2gkKO0JJ/XtyMyq1LEBt296N X-Received: by 2002:a05:6830:23b6:: with SMTP id m22mr360480ots.225.1586241590630; Mon, 06 Apr 2020 23:39:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1586241590; cv=none; d=google.com; s=arc-20160816; b=bqPHVmB+AUmh1KDLRbN74+KAj07DQMyqL08jEDUHbBW/FOezZ/HeLDepz+Fzf3yH+y 7LkJ+Muqel2MMPSFN/1GmbtANS4/9iboQCZobpbTQ52D4ctw0zPi8lUcOv2QYuTlChR0 ZPV2S6j0MrsiU6HTA5WGWYealrZS0n07y1IGdlU+wP1jNBjwFQob9trBTIoKeOcep2lm 1fXmgfHS9BeHFtvGtwmHB2PH+4/i7LbrLPD08h6fN3zolRtEA0scKLHvRFUNsmFIHL78 rXhSj45yUBSvUxxM9sCsjdylQHmeFQBLCxAlDo/d4RfmVtFWYHlO0aJzKsNXaHuP853n mbYA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=HaVOfIyHtw1cy1xyl+orHyqrEmhmPWHQrKUL2yDi53A=; b=eoIzsWFWxzKXFL7R6ydWQpsDOqUK7bJmvn9WfKAZ2NfNQaFKuASpkRrAaHrFhGvPpj LgypRcoIIcQl0gUs0HiAHRplk3Fbn1H1vJd38N8OifKuQlORee6jRKcHLh67/IxR8zgr kXyxwJcYT2FY62/n/5u9/6N/ZitQBGd2wTYbv81x2+z4QrQ22+pxd3yP8GlxgPQ5Hbyz 440N3ycfsXYqOWljWg0Pk/1b7V7n0gR+7R486UjxEKIbXhTEwv5h5zKSwG9dRoq6njsP MG6gsIbSQZO0oRQcO8n9HJwieOosG1cE6eMMsqT/OINZqorNxoy61Q0n0uTK2sdqVjFY whTA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=ilAod5DX; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f26si896430otc.182.2020.04.06.23.39.39; Mon, 06 Apr 2020 23:39:50 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=ilAod5DX; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726720AbgDGGjf (ORCPT + 99 others); Tue, 7 Apr 2020 02:39:35 -0400 Received: from mail-oi1-f195.google.com ([209.85.167.195]:42687 "EHLO mail-oi1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726030AbgDGGje (ORCPT ); Tue, 7 Apr 2020 02:39:34 -0400 Received: by mail-oi1-f195.google.com with SMTP id e4so557252oig.9 for ; Mon, 06 Apr 2020 23:39:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=HaVOfIyHtw1cy1xyl+orHyqrEmhmPWHQrKUL2yDi53A=; b=ilAod5DXoeekRa69iP7VHVQOehkPIxt0/W6BMI++qCykV2gcLRM3IPs4+JolZngVjf GQcNDW8jfevzQuVHqkwo8bzHijB4rIMLjjlnOv0PwMSJXJpCfmBgMeqs4VvqEZArNqGU YUNo+liqk2BrL08EBIbYz2hD5dVoF13/uO4DhZiK4BcojmgCBrjEdE4xBxA3uxcjYCCw throACpbivk9iwSDCIA8bAppZ0l95SnLcluGQLVVxxFhUds/3LEjnDbjEO2VxxLevMo2 Xtpax/IIzwQCDINeTTZ0iC0AT0wz/clMiEc15tVN8MHLa7AlD6zGEad7W28U3upVOQBa 6FAg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=HaVOfIyHtw1cy1xyl+orHyqrEmhmPWHQrKUL2yDi53A=; b=Eoi7ofQoX1QffPfV1hv070gA1QTFa9mMKy6/HzCkdHr94xXls+Nq7Bl/fcigzDJMjA 04tsUBtaShEhyQ1LrIfi1EhirIRI65R0Ir4zFxC2LaUJz+G3dATgs9121XQrmxhIh4KP wG9U30Qxxp7J82/evuQYIbAfXBI+1P0UklwTOPuGoP15uIAbsfpbzouXXarBX/h2V9+m bLRF5INhGtX0FRk1xALurW2O/17Eu/xVAoysMvdHigmqy5cP5Si0zj9AzIkEiXzoV1Iu lsgsKqk1IVSb6+k4Z37z/9THs99kxvCCRpd0+btnTchvGgc/9RV3Ob8vMDB1CE6Z41Gz h9zA== X-Gm-Message-State: AGi0PuZx+nktQUHrE1OX0bs3I5cvr8PNwLA30SgT8pU/Qy3x91NGx/gr 2QOnVTyOQPXEKek1eHSMvVxnM0DD7ixqW0NjBjN0NymF X-Received: by 2002:aca:4e47:: with SMTP id c68mr618526oib.16.1586241573572; Mon, 06 Apr 2020 23:39:33 -0700 (PDT) MIME-Version: 1.0 References: <20200325093728.204211-1-harshadshirwadkar@gmail.com> <20200325093728.204211-2-harshadshirwadkar@gmail.com> In-Reply-To: From: harshad shirwadkar Date: Mon, 6 Apr 2020 23:39:22 -0700 Message-ID: Subject: Re: [PATCH 2/2] ext4: shrink directories on dentry delete To: Andreas Dilger Cc: Ext4 Developers List Content-Type: text/plain; charset="UTF-8" Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org Thanks for the review Andreas. On Sat, Mar 28, 2020 at 5:01 PM Andreas Dilger wrote: > > On Mar 25, 2020, at 3:37 AM, Harshad Shirwadkar wrote: > > > > This patch adds shrinking support for htree based directories. The > > high level algorithm is as follows: > > > > * If after dentry removal the dirent block (let's call it B) becomes > > empty, then remove its references in its dx parent. > > * Swap its contents with that of the last block (L) in directory. > > * Update L's parents to point to B instead. > > * Remove L > > * Repeat this for all the ancestors of B. > > > > We add variants of dx_probe that allow us perform reverse lookups from > > a logical block to its dx parents. > > > > Ran kvm-xfstests smoke and verified that no new failures are > > introduced. Ran shrinking for directories with following number of > > files and then deleted files one by one: > > * 1000 (size before deletion 36K, after deletion 4K) > > * 10000 (size before deletion 196K, after deletion 4K) > > * 100000 (size before deletion 2.1M, after deletion 4K) > > * 200000 (size before deletion 4.2M, after deletion 4K) > > > > In all cases directory shrunk significantly. We fallback to linear > > directories if the directory becomes empty. > > > > But note that most of the shrinking happens during last 1-2% deletions > > in an average case. Therefore, the next step here is to merge dx nodes > > when possible. That can be achieved by storing the fullness index in > > htree nodes. But that's an on-disk format change. We can instead build > > on tooling added by this patch to perform reverse lookup on a dx > > node and then reading adjacent nodes to check their fullness. > > > > This patch supersedes the other directory shrinking patch sent in Aug > > 2019 ("ext4: attempt to shrink directory on dentry removal"). > > > > Signed-off-by: Harshad Shirwadkar > > --- > > fs/ext4/ext4_jbd2.h | 7 + > > fs/ext4/namei.c | 355 ++++++++++++++++++++++++++++++++++++++++++-- > > 2 files changed, 353 insertions(+), 9 deletions(-) > > > > diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c > > index d567b9589875..b78c6f9a6eba 100644 > > --- a/fs/ext4/namei.c > > +++ b/fs/ext4/namei.c > > > > +/* > > + * dx_probe with relaxed checks. This function is used in the directory > > + * shrinking code since we can run into intermediate states where we have > > + * internal dx nodes with count = 0. > > + */ > > +static inline struct dx_frame * > > +dx_probe_relaxed(struct ext4_filename *fname, struct inode *dir, > > + struct dx_hash_info *hinfo, struct dx_frame *frame_in) > > +{ > > + return __dx_probe(fname, dir, hinfo, frame_in, 0, false); > > +} > > + > > +/* > > + * Perform only a parttial dx_probe until we find block end_lblk. > > (typo) "partial" Ack > > > +static inline struct dx_frame * > > +dx_probe_partial(struct ext4_filename *fname, struct inode *dir, > > + struct dx_hash_info *hinfo, struct dx_frame *frame_in, > > + ext4_lblk_t end_lblk) > > +{ > > + return __dx_probe(fname, dir, hinfo, frame_in, end_lblk, false); > > +} > > + > [snip] Ack > > +/* > > + * This function tries to remove the entry of a dirent block (which was just > > + * emptied by the caller) from the dx frame. It does so by reducing the count by > > + * 1 and left shifting all the entries after the deleted entry. > > + */ > > +int > > +ext4_remove_dx_entry(handle_t *handle, struct inode *dir, > > + struct dx_frame *dx_frame) > > +{ > > + err = ext4_journal_get_write_access(handle, dx_frame->bh); > > + if (err) { > > + ext4_std_error(dir->i_sb, err); > > + return -EINVAL; > > + } > > + > > + for (; i < count - 1; i++) > > + entries[i] = entries[i + 1]; > > It would be more efficient to do this with "memmove()" rather than copying > each entry separately. Ack, implemented in V2. > > > + /* > > + * If i was 0 when we began above loop, we would have overwritten count > > + * and limit values since those values live in dx_entry->hash of the > > + * first entry. We need to update count but we should set limit as well. > > + */ > > + dx_set_count(entries, count - 1); > > + dx_set_limit(entries, limit); > > How hard is it to avoid clobbering these fields in the first place? > I'm just thinking that "clobber + fixup" is subject to race conditions > at various times in the past, and may become an issue in the future > (e.g. with parallel directory operations). > Ack, I agree better to not clobber in the first place. Implemented in V2. > > static inline bool is_empty_dirent_block(struct inode *dir, > > + struct buffer_head *bh) > > +{ > > This should be combined with ext4_empty_dir() to avoid code duplication. Thanks, this also simplifies ext4_empty_dir(). > > > + struct ext4_dir_entry_2 *de = (struct ext4_dir_entry_2 *)bh->b_data; > > + int csum_size = 0; > > + > > + if (ext4_has_metadata_csum(dir->i_sb) && is_dx(dir)) > > + csum_size = sizeof(struct ext4_dir_entry_tail); > > + > > + return ext4_rec_len_from_disk(de->rec_len, dir->i_sb->s_blocksize) == > > + dir->i_sb->s_blocksize - csum_size && de->inode == 0; > > +} > > This looks like a low cost way to determine the leaf block is empty, > but checking this for every unlink likely has a non-zero cost. Ack > > > @@ -2530,6 +2864,9 @@ static int ext4_delete_entry(handle_t *handle, > > if (unlikely(err)) > > goto out; > > > > + if (is_dx(dir)) > > + ext4_try_dir_shrink(handle, dir, lblk, bh); > > + > > return 0; > > It would be useful to run a comparison benchmark between the patched ext4 > and unpatched when deleting a large number of entries that checks both CPU > usage and performance. That will give us an idea of how much this costs > to be checked for every entry. CPU performance wasn't that different fo both the cases, but there are some differences in overall unlink performance. I added a performance evaluation section in V2. > > Also, rather than calling ext4_try_dir_shrink() and is_empty_dirent_block() > for every entry, couldn't this be returned from ext4_generic_delete_entry(), > since it has that information already. Thanks, implemented this in V2. Thanks, Harshad > > Cheers, Andreas > > > > >