Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp4863226pxu; Thu, 10 Dec 2020 07:16:49 -0800 (PST) X-Google-Smtp-Source: ABdhPJy7qGlIin4tyOv0besOvlNJLIt+owHvS7FGCBvzDzfgFGXO1zgn0rUb2zNMqHAcFZdLLQtj X-Received: by 2002:a17:906:7f10:: with SMTP id d16mr6831111ejr.104.1607613409535; Thu, 10 Dec 2020 07:16:49 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1607613409; cv=none; d=google.com; s=arc-20160816; b=x1dbVL0bU2QoemKgdqPJhuUO3q1s1611DW5B4+A9qCwpGt+2X8+9pA9KW0AOVY5PDW gXoTEuQs6BESEMrBmRJzyZpuVDPVJGHpOn08cSyTEYPOBT544Xj299YZ1pcme2v8iksG TMTtEyhS1ddOzEr9w7KE6SQc3B8z26f8I6YAEy9VRmZs8eUYDx3e+q9/gq4e+9G4/amx sdZmrbf3LXu/EdoDKJiMEBm0/8vAwMcihlZlggaHNsKF7uvWBInI8DS3E8CgK7xOSwTZ ygbkLar7xGbdkgIfBVXofhCvhh+cwxkD9Vz3tsx37u2JUUQU5AEMGpsyXoOPVlI/CXGo MLHA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=bWQDeRZRt7ymq8lTkg/0RbrsTvjTatkMN9+1CAkd5j8=; b=N3Y+tfCGrdOtjdoicqfdqjkm/kWK0Esrr22N0gcta+zELDtEgQJkzAr71ko13t87xy WuPrLmfeoOp6rsqpKEfzjlySbVF6va0dfxMHmjVbIL94WrWX7/XPtpySosARmxXNg+2b AjqRWJJuMMY1G78Ga4j9PBAViUP1sML4V+qR0Or8XY3NStEHeulxZWiIBr6PlJ54Upt4 Y51An/5eZjmAO66iY8n/y8ljY9ndtXXJhvC6kvveD7amz7m6lY0+WMkIT65w1VFCXv4q JYMMn8eiXjH85K7uyq2SXIvW+vKboFWIDvHUqa1Zvmq/ncSs6fS9+VUQ1uYHKAnjKPyb q3/g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=collabora.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id bu4si2717550ejb.195.2020.12.10.07.16.25; Thu, 10 Dec 2020 07:16:49 -0800 (PST) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=collabora.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727531AbgLJPFw (ORCPT + 99 others); Thu, 10 Dec 2020 10:05:52 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37922 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2389296AbgLJPFe (ORCPT ); Thu, 10 Dec 2020 10:05:34 -0500 Received: from bhuna.collabora.co.uk (bhuna.collabora.co.uk [IPv6:2a00:1098:0:82:1000:25:2eeb:e3e3]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C17B6C0617B0 for ; Thu, 10 Dec 2020 07:04:05 -0800 (PST) Received: from xps.home (unknown [IPv6:2a01:e35:2fb5:1510:1626:c942:e0f1:c77c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: aferraris) by bhuna.collabora.co.uk (Postfix) with ESMTPSA id E453D1F458FB; Thu, 10 Dec 2020 15:04:03 +0000 (GMT) From: Arnaud Ferraris To: linux-ext4@vger.kernel.org Cc: drosen@google.com, krisman@collabora.com, ebiggers@kernel.org, tytso@mit.edu, Arnaud Ferraris Subject: [PATCH RESEND v2 07/12] e2fsck: Support casefold directories when rehashing Date: Thu, 10 Dec 2020 16:03:48 +0100 Message-Id: <20201210150353.91843-8-arnaud.ferraris@collabora.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201210150353.91843-1-arnaud.ferraris@collabora.com> References: <20201210150353.91843-1-arnaud.ferraris@collabora.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org From: Gabriel Krisman Bertazi When rehashing a +F directory, the casefold comparison needs to be performed, in order to identify duplicated filenames. Like the -F version, This is done in two steps, first adapt the qsort comparison to consider casefolded directories, and then iterate over the sorted list fixing dups. Signed-off-by: Gabriel Krisman Bertazi Signed-off-by: Arnaud Ferraris --- e2fsck/rehash.c | 88 ++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 72 insertions(+), 16 deletions(-) diff --git a/e2fsck/rehash.c b/e2fsck/rehash.c index 30e510a6..14215011 100644 --- a/e2fsck/rehash.c +++ b/e2fsck/rehash.c @@ -214,6 +214,23 @@ static EXT2_QSORT_TYPE ino_cmp(const void *a, const void *b) return (he_a->ino - he_b->ino); } +struct name_cmp_ctx +{ + int casefold; + const struct ext2fs_nls_table *tbl; +}; + + +static int same_name(const struct name_cmp_ctx *cmp_ctx, char *s1, + int len1, char *s2, int len2) +{ + if (!cmp_ctx->casefold) + return (len1 == len2 && !memcmp(s1, s2, len1)); + else + return !ext2fs_casefold_cmp(cmp_ctx->tbl, + s1, len1, s2, len2); +} + /* Used for sorting the hash entry */ static EXT2_QSORT_TYPE name_cmp(const void *a, const void *b) { @@ -240,9 +257,35 @@ static EXT2_QSORT_TYPE name_cmp(const void *a, const void *b) return ret; } +static EXT2_QSORT_TYPE name_cf_cmp(const struct name_cmp_ctx *ctx, + const void *a, const void *b) +{ + const struct hash_entry *he_a = (const struct hash_entry *) a; + const struct hash_entry *he_b = (const struct hash_entry *) b; + unsigned int he_a_len, he_b_len, min_len; + int ret; + + he_a_len = ext2fs_dirent_name_len(he_a->dir); + he_b_len = ext2fs_dirent_name_len(he_b->dir); + + ret = ext2fs_casefold_cmp(ctx->tbl, he_a->dir->name, he_a_len, + he_b->dir->name, he_b_len); + if (ret == 0) { + if (he_a_len > he_b_len) + ret = 1; + else if (he_a_len < he_b_len) + ret = -1; + else + ret = he_b->dir->inode - he_a->dir->inode; + } + return ret; +} + + /* Used for sorting the hash entry */ -static EXT2_QSORT_TYPE hash_cmp(const void *a, const void *b) +static EXT2_QSORT_TYPE hash_cmp(const void *a, const void *b, void *arg) { + const struct name_cmp_ctx *ctx = (struct name_cmp_ctx *) arg; const struct hash_entry *he_a = (const struct hash_entry *) a; const struct hash_entry *he_b = (const struct hash_entry *) b; int ret; @@ -256,8 +299,12 @@ static EXT2_QSORT_TYPE hash_cmp(const void *a, const void *b) ret = 1; else if (he_a->minor_hash < he_b->minor_hash) ret = -1; - else - ret = name_cmp(a, b); + else { + if (ctx->casefold) + ret = name_cf_cmp(ctx, a, b); + else + ret = name_cmp(a, b); + } } return ret; } @@ -380,7 +427,8 @@ static void mutate_name(char *str, unsigned int *len) static int duplicate_search_and_fix(e2fsck_t ctx, ext2_filsys fs, ext2_ino_t ino, - struct fill_dir_struct *fd) + struct fill_dir_struct *fd, + const struct name_cmp_ctx *cmp_ctx) { struct problem_context pctx; struct hash_entry *ent, *prev; @@ -403,11 +451,12 @@ static int duplicate_search_and_fix(e2fsck_t ctx, ext2_filsys fs, ent = fd->harray + i; prev = ent - 1; if (!ent->dir->inode || - (ext2fs_dirent_name_len(ent->dir) != - ext2fs_dirent_name_len(prev->dir)) || - memcmp(ent->dir->name, prev->dir->name, - ext2fs_dirent_name_len(ent->dir))) + !same_name(cmp_ctx, ent->dir->name, + ext2fs_dirent_name_len(ent->dir), + prev->dir->name, + ext2fs_dirent_name_len(prev->dir))) continue; + pctx.dirent = ent->dir; if ((ent->dir->inode == prev->dir->inode) && fix_problem(ctx, PR_2_DUPLICATE_DIRENT, &pctx)) { @@ -426,10 +475,11 @@ static int duplicate_search_and_fix(e2fsck_t ctx, ext2_filsys fs, mutate_name(new_name, &new_len); for (j=0; j < fd->num_array; j++) { if ((i==j) || - (new_len != - (unsigned) ext2fs_dirent_name_len(fd->harray[j].dir)) || - memcmp(new_name, fd->harray[j].dir->name, new_len)) + !same_name(cmp_ctx, new_name, new_len, + fd->harray[j].dir->name, + ext2fs_dirent_name_len(fd->harray[j].dir))) { continue; + } mutate_name(new_name, &new_len); j = -1; @@ -894,6 +944,7 @@ errcode_t e2fsck_rehash_dir(e2fsck_t ctx, ext2_ino_t ino, struct fill_dir_struct fd = { NULL, NULL, 0, 0, 0, NULL, 0, 0, 0, 0, 0, 0 }; struct out_dir outdir = { 0, 0, 0, 0 }; + struct name_cmp_ctx name_cmp_ctx = {0, NULL}; e2fsck_read_inode(ctx, ino, &inode, "rehash_dir"); @@ -921,6 +972,11 @@ errcode_t e2fsck_rehash_dir(e2fsck_t ctx, ext2_ino_t ino, fd.compress = 1; fd.parent = 0; + if (fs->encoding && (inode.i_flags & EXT4_CASEFOLD_FL)) { + name_cmp_ctx.casefold = 1; + name_cmp_ctx.tbl = fs->encoding; + } + retry_nohash: /* Read in the entire directory into memory */ retval = ext2fs_block_iterate3(fs, ino, 0, 0, @@ -949,16 +1005,16 @@ retry_nohash: /* Sort the list */ resort: if (fd.compress && fd.num_array > 1) - qsort(fd.harray+2, fd.num_array-2, sizeof(struct hash_entry), - hash_cmp); + qsort_r(fd.harray+2, fd.num_array-2, sizeof(struct hash_entry), + hash_cmp, &name_cmp_ctx); else - qsort(fd.harray, fd.num_array, sizeof(struct hash_entry), - hash_cmp); + qsort_r(fd.harray, fd.num_array, sizeof(struct hash_entry), + hash_cmp, &name_cmp_ctx); /* * Look for duplicates */ - if (duplicate_search_and_fix(ctx, fs, ino, &fd)) + if (duplicate_search_and_fix(ctx, fs, ino, &fd, &name_cmp_ctx)) goto resort; if (ctx->options & E2F_OPT_NO) { -- 2.29.2