Received: by 2002:aa6:c429:0:b029:98:93ff:f56f with SMTP id g9csp4950835lkq; Thu, 10 Dec 2020 12:56:06 -0800 (PST) X-Google-Smtp-Source: ABdhPJyNQInYMcbMdLINeHnAWsVwmCIyDfTsxVQTtNYjIXJ/BL8sjMt75UF77y4SwYNkIjRINOIg X-Received: by 2002:a17:906:7090:: with SMTP id b16mr8311459ejk.76.1607633766386; Thu, 10 Dec 2020 12:56:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1607633766; cv=none; d=google.com; s=arc-20160816; b=O8P7B6Gqwrh9UUD3m2bqSTGv6BSrx8XqJ3myQdYCIodoOSNv6Dw6X76r4XEGPJczhV CMUJnvjXU6oZQfDJOBUspKkB5pGz/dJdFfvJfhPqt7wm3RH34XSr93jC2AKDDiAxzEcN 9HNsTQjwGXsKo5EMG94a32nvFmG0Z+9CYD9l49ftIIaQJrjMMrMeMLWm87L/mIs6VLyi qBz8iJgDCrvFV42Ck2YtX9vW//7e9GwEhgMd+CIJiql8BFZjUD1vdBBUYbt8sen9DCk2 GHwci14CwoQmHBUTbbmyD7P4XqipE8ORBvAyNiCacUmyY2umHiTO2LQnQfwRZU5Ymhty jLOA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent:message-id:in-reply-to :date:references:organization:subject:cc:to:from; bh=B102HvyirKNwMSOrFlu8JPUvsQ8CS8ULgHtIG4IlC3g=; b=ImY1HdOgvD6JXx54+TOCH+o717MJBLRlIgVWMdwFxO45qosAklNe2cB3YcJ7qasCto wOERHSs4IJSDcMtLmAjKDwXF4mQVwj/Mr9m4iQUGPRnoE/wWvcT1KPjOkWulzWdKw7c+ k2a6hCxXHAlVANZniFu2muvytSU7zOqHO8y2zkgOmKOPC6rRHlEt7IDsx7ahHpdtmNgq yvH1YnyzxeKjXpcTgPliGgTVe1DaGTd8rxOJ4/2mVK8+V5KjVYg5TMTPodFTs0/aW0lN HqkOqMz8u/knCTWju8b/5UFNfG10pL2+CrISGnBBIEO1LINRy/GLIvgNzpRLf2UmZjQ6 A6Tw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=collabora.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id z23si3821642edl.270.2020.12.10.12.55.42; Thu, 10 Dec 2020 12:56:06 -0800 (PST) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=collabora.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2404666AbgLJUyu (ORCPT + 99 others); Thu, 10 Dec 2020 15:54:50 -0500 Received: from bhuna.collabora.co.uk ([46.235.227.227]:52450 "EHLO bhuna.collabora.co.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2404653AbgLJUyp (ORCPT ); Thu, 10 Dec 2020 15:54:45 -0500 Received: from localhost (unknown [IPv6:2804:14c:132:242d::1000]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) (Authenticated sender: krisman) by bhuna.collabora.co.uk (Postfix) with ESMTPSA id 685D91F45C5E; Thu, 10 Dec 2020 20:54:03 +0000 (GMT) From: Gabriel Krisman Bertazi To: Arnaud Ferraris Cc: linux-ext4@vger.kernel.org, drosen@google.com, ebiggers@kernel.org, tytso@mit.edu Subject: Re: [PATCH RESEND v2 07/12] e2fsck: Support casefold directories when rehashing Organization: Collabora References: <20201210150353.91843-1-arnaud.ferraris@collabora.com> <20201210150353.91843-8-arnaud.ferraris@collabora.com> Date: Thu, 10 Dec 2020 17:53:57 -0300 In-Reply-To: <20201210150353.91843-8-arnaud.ferraris@collabora.com> (Arnaud Ferraris's message of "Thu, 10 Dec 2020 16:03:48 +0100") Message-ID: <87y2i51ixm.fsf@collabora.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org Arnaud Ferraris writes: > From: Gabriel Krisman Bertazi > > When rehashing a +F directory, the casefold comparison needs to be > performed, in order to identify duplicated filenames. Like the -F > version, This is done in two steps, first adapt the qsort comparison to > consider casefolded directories, and then iterate over the sorted list > fixing dups. > > Signed-off-by: Gabriel Krisman Bertazi > Signed-off-by: Arnaud Ferraris > --- > e2fsck/rehash.c | 88 ++++++++++++++++++++++++++++++++++++++++--------- > 1 file changed, 72 insertions(+), 16 deletions(-) > > diff --git a/e2fsck/rehash.c b/e2fsck/rehash.c > index 30e510a6..14215011 100644 > --- a/e2fsck/rehash.c > +++ b/e2fsck/rehash.c > @@ -214,6 +214,23 @@ static EXT2_QSORT_TYPE ino_cmp(const void *a, const void *b) > return (he_a->ino - he_b->ino); > } > > +struct name_cmp_ctx > +{ > + int casefold; > + const struct ext2fs_nls_table *tbl; > +}; > + > + > +static int same_name(const struct name_cmp_ctx *cmp_ctx, char *s1, > + int len1, char *s2, int len2) > +{ > + if (!cmp_ctx->casefold) > + return (len1 == len2 && !memcmp(s1, s2, len1)); > + else > + return !ext2fs_casefold_cmp(cmp_ctx->tbl, > + s1, len1, s2, len2); > +} > + > /* Used for sorting the hash entry */ > static EXT2_QSORT_TYPE name_cmp(const void *a, const void *b) > { > @@ -240,9 +257,35 @@ static EXT2_QSORT_TYPE name_cmp(const void *a, const void *b) > return ret; > } > > +static EXT2_QSORT_TYPE name_cf_cmp(const struct name_cmp_ctx *ctx, > + const void *a, const void *b) > +{ > + const struct hash_entry *he_a = (const struct hash_entry *) a; > + const struct hash_entry *he_b = (const struct hash_entry *) b; > + unsigned int he_a_len, he_b_len, min_len; > + int ret; > + > + he_a_len = ext2fs_dirent_name_len(he_a->dir); > + he_b_len = ext2fs_dirent_name_len(he_b->dir); > + > + ret = ext2fs_casefold_cmp(ctx->tbl, he_a->dir->name, he_a_len, > + he_b->dir->name, he_b_len); > + if (ret == 0) { > + if (he_a_len > he_b_len) > + ret = 1; > + else if (he_a_len < he_b_len) > + ret = -1; > + else > + ret = he_b->dir->inode - he_a->dir->inode; > + } > + return ret; > +} > + > + extra line > /* Used for sorting the hash entry */ > -static EXT2_QSORT_TYPE hash_cmp(const void *a, const void *b) > +static EXT2_QSORT_TYPE hash_cmp(const void *a, const void *b, void *arg) > { > + const struct name_cmp_ctx *ctx = (struct name_cmp_ctx *) arg; > const struct hash_entry *he_a = (const struct hash_entry *) a; > const struct hash_entry *he_b = (const struct hash_entry *) b; > int ret; > @@ -256,8 +299,12 @@ static EXT2_QSORT_TYPE hash_cmp(const void *a, const void *b) > ret = 1; > else if (he_a->minor_hash < he_b->minor_hash) > ret = -1; > - else > - ret = name_cmp(a, b); > + else { > + if (ctx->casefold) > + ret = name_cf_cmp(ctx, a, b); > + else > + ret = name_cmp(a, b); > + } > } > return ret; > } > @@ -380,7 +427,8 @@ static void mutate_name(char *str, unsigned int *len) > > static int duplicate_search_and_fix(e2fsck_t ctx, ext2_filsys fs, > ext2_ino_t ino, > - struct fill_dir_struct *fd) > + struct fill_dir_struct *fd, > + const struct name_cmp_ctx *cmp_ctx) > { > struct problem_context pctx; > struct hash_entry *ent, *prev; > @@ -403,11 +451,12 @@ static int duplicate_search_and_fix(e2fsck_t ctx, ext2_filsys fs, > ent = fd->harray + i; > prev = ent - 1; > if (!ent->dir->inode || > - (ext2fs_dirent_name_len(ent->dir) != > - ext2fs_dirent_name_len(prev->dir)) || > - memcmp(ent->dir->name, prev->dir->name, > - ext2fs_dirent_name_len(ent->dir))) > + !same_name(cmp_ctx, ent->dir->name, > + ext2fs_dirent_name_len(ent->dir), > + prev->dir->name, > + ext2fs_dirent_name_len(prev->dir))) > continue; > + noise. Other than that, I think this is still good. > pctx.dirent = ent->dir; > if ((ent->dir->inode == prev->dir->inode) && > fix_problem(ctx, PR_2_DUPLICATE_DIRENT, &pctx)) { > @@ -426,10 +475,11 @@ static int duplicate_search_and_fix(e2fsck_t ctx, ext2_filsys fs, > mutate_name(new_name, &new_len); > for (j=0; j < fd->num_array; j++) { > if ((i==j) || > - (new_len != > - (unsigned) ext2fs_dirent_name_len(fd->harray[j].dir)) || > - memcmp(new_name, fd->harray[j].dir->name, new_len)) > + !same_name(cmp_ctx, new_name, new_len, > + fd->harray[j].dir->name, > + ext2fs_dirent_name_len(fd->harray[j].dir))) { > continue; > + } > mutate_name(new_name, &new_len); > > j = -1; > @@ -894,6 +944,7 @@ errcode_t e2fsck_rehash_dir(e2fsck_t ctx, ext2_ino_t ino, > struct fill_dir_struct fd = { NULL, NULL, 0, 0, 0, NULL, > 0, 0, 0, 0, 0, 0 }; > struct out_dir outdir = { 0, 0, 0, 0 }; > + struct name_cmp_ctx name_cmp_ctx = {0, NULL}; > > e2fsck_read_inode(ctx, ino, &inode, "rehash_dir"); > > @@ -921,6 +972,11 @@ errcode_t e2fsck_rehash_dir(e2fsck_t ctx, ext2_ino_t ino, > fd.compress = 1; > fd.parent = 0; > > + if (fs->encoding && (inode.i_flags & EXT4_CASEFOLD_FL)) { > + name_cmp_ctx.casefold = 1; > + name_cmp_ctx.tbl = fs->encoding; > + } > + > retry_nohash: > /* Read in the entire directory into memory */ > retval = ext2fs_block_iterate3(fs, ino, 0, 0, > @@ -949,16 +1005,16 @@ retry_nohash: > /* Sort the list */ > resort: > if (fd.compress && fd.num_array > 1) > - qsort(fd.harray+2, fd.num_array-2, sizeof(struct hash_entry), > - hash_cmp); > + qsort_r(fd.harray+2, fd.num_array-2, sizeof(struct hash_entry), > + hash_cmp, &name_cmp_ctx); > else > - qsort(fd.harray, fd.num_array, sizeof(struct hash_entry), > - hash_cmp); > + qsort_r(fd.harray, fd.num_array, sizeof(struct hash_entry), > + hash_cmp, &name_cmp_ctx); > > /* > * Look for duplicates > */ > - if (duplicate_search_and_fix(ctx, fs, ino, &fd)) > + if (duplicate_search_and_fix(ctx, fs, ino, &fd, &name_cmp_ctx)) > goto resort; > > if (ctx->options & E2F_OPT_NO) { -- Gabriel Krisman Bertazi