Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp2883915ybi; Mon, 17 Jun 2019 12:03:10 -0700 (PDT) X-Google-Smtp-Source: APXvYqwCMvUt5ESoMueeuCTKnG/eTNsI/PeFvHsRz8I8hLbRBbOKQzJUQRgHdpLb+otCaGSIvwAT X-Received: by 2002:a17:902:29e6:: with SMTP id h93mr103941630plb.297.1560798190783; Mon, 17 Jun 2019 12:03:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1560798190; cv=none; d=google.com; s=arc-20160816; b=CjFg2GpNyKXfjb4w8O4+h+hWAiwFlBc6Qw8EtpSy52iDNu6gcSWHfLBFsVy5r2JLdG k3oXuRzbjAvtyhbtgQvk/lP+5MpJ9QcrTx3Bw0hNspx+yTK63nSZ5aTHlUWRnSYsqGVp hopFJhQfaZFcH8SIxTY3Rk3ylbxFo4bAsvh6RoYVm2wq5RWCWnsKM0j/ymlnp77S/JnD EUTi5BkS++NbSMOUUUIMWt44LR1oZ2jSQrC5KK8i/EQwn1YrKInB8aw7WzKzNVqfDglq Z411cSOZpYmQvKgrn3A1WHbIDuvI5wX9RoGFNBn9s2Zry5Kbcjp/I9X6zGhknjQZ1zNU s6Hg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=NuvSeCVb/q9RksBkK6hYpDf95t8RftZndfo6v2o82VA=; b=lO2fv0kt3AEI+nBJZI/O1ZIU8wkdQRYa/NclSalTMVU9SZ+uhV6hAqzpbLXoNwdA7D BBiT81bXAMybzsmIUoke3lw4F6Si8wKT+rcg21ArFkYFlpCSaUM9F6kXWOqGrQ1V8sHS 3r0YdoKytm1qJa51tjhLrGweoSSBN/RdFVRem3DWnMMYXK15/vPVeX5c1qIB0Jc1CsKq f1T1/MhFPuaffLmwzI/PWTDtt0r0IVIPzMg2kzXouBWi2DN7YBotSLQrxpfJIPi6jU97 q17VV5ug9+cLN+OgtVc30v8JA1X5sj158Aawh8VGEruNG9zg55bBAQaBSAjdezZaaCRj ONdA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=collabora.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j38si7837430pgi.470.2019.06.17.12.02.50; Mon, 17 Jun 2019 12:03:10 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=collabora.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726047AbfFQTCt (ORCPT + 99 others); Mon, 17 Jun 2019 15:02:49 -0400 Received: from bhuna.collabora.co.uk ([46.235.227.227]:55868 "EHLO bhuna.collabora.co.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725497AbfFQTCt (ORCPT ); Mon, 17 Jun 2019 15:02:49 -0400 Received: from [127.0.0.1] (localhost [127.0.0.1]) (Authenticated sender: krisman) with ESMTPSA id 448532614E9 From: Gabriel Krisman Bertazi To: tytso@mit.edu Cc: linux-ext4@vger.kernel.org, Gabriel Krisman Bertazi , kernel@collabora.com Subject: [PATCH] ext4: Optimize case-insensitive lookups Date: Mon, 17 Jun 2019 15:02:40 -0400 Message-Id: <20190617190240.30996-1-krisman@collabora.com> X-Mailer: git-send-email 2.20.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org Temporarily cache a casefolded version of the file name under lookup in ext4_filename, to avoid repeatedly casefolding it. I got up to 30% speedup on lookups of large directories (>100k entries), depending on the length of the string under lookup. v2: - Dinamically allocate space for the casefolded version. Signed-off-by: Gabriel Krisman Bertazi --- fs/ext4/dir.c | 2 +- fs/ext4/ext4.h | 39 ++++++++++++++++++++++++++++++++++--- fs/ext4/namei.c | 43 ++++++++++++++++++++++++++++++++++++----- fs/unicode/utf8-core.c | 28 +++++++++++++++++++++++++++ include/linux/unicode.h | 3 +++ 5 files changed, 106 insertions(+), 9 deletions(-) diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c index c7843b149a1e..0a427e18584a 100644 --- a/fs/ext4/dir.c +++ b/fs/ext4/dir.c @@ -674,7 +674,7 @@ static int ext4_d_compare(const struct dentry *dentry, unsigned int len, return memcmp(str, name->name, len); } - return ext4_ci_compare(dentry->d_parent->d_inode, name, &qstr); + return ext4_ci_compare(dentry->d_parent->d_inode, name, &qstr, false); } static int ext4_d_hash(const struct dentry *dentry, struct qstr *str) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 1cb67859e051..c0793d9e5d12 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -2077,6 +2077,9 @@ struct ext4_filename { #ifdef CONFIG_FS_ENCRYPTION struct fscrypt_str crypto_buf; #endif +#ifdef CONFIG_UNICODE + struct fscrypt_str cf_name; +#endif }; #define fname_name(p) ((p)->disk_name.name) @@ -2302,6 +2305,12 @@ extern unsigned ext4_free_clusters_after_init(struct super_block *sb, struct ext4_group_desc *gdp); ext4_fsblk_t ext4_inode_to_goal_block(struct inode *); +#ifdef CONFIG_UNICODE +extern void ext4_fname_setup_ci_filename(struct inode *dir, + const struct qstr *iname, + struct fscrypt_str *fname); +#endif + #ifdef CONFIG_FS_ENCRYPTION static inline void ext4_fname_from_fscrypt_name(struct ext4_filename *dst, const struct fscrypt_name *src) @@ -2328,6 +2337,10 @@ static inline int ext4_fname_setup_filename(struct inode *dir, return err; ext4_fname_from_fscrypt_name(fname, &name); + +#ifdef CONFIG_UNICODE + ext4_fname_setup_ci_filename(dir, iname, &fname->cf_name); +#endif return 0; } @@ -2343,6 +2356,10 @@ static inline int ext4_fname_prepare_lookup(struct inode *dir, return err; ext4_fname_from_fscrypt_name(fname, &name); + +#ifdef CONFIG_UNICODE + ext4_fname_setup_ci_filename(dir, &dentry->d_name, &fname->cf_name); +#endif return 0; } @@ -2356,6 +2373,11 @@ static inline void ext4_fname_free_filename(struct ext4_filename *fname) fname->crypto_buf.name = NULL; fname->usr_fname = NULL; fname->disk_name.name = NULL; + +#ifdef CONFIG_UNICODE + kfree(fname->cf_name.name); + fname->cf_name.name = NULL; +#endif } #else /* !CONFIG_FS_ENCRYPTION */ static inline int ext4_fname_setup_filename(struct inode *dir, @@ -2366,6 +2388,11 @@ static inline int ext4_fname_setup_filename(struct inode *dir, fname->usr_fname = iname; fname->disk_name.name = (unsigned char *) iname->name; fname->disk_name.len = iname->len; + +#ifdef CONFIG_UNICODE + ext4_fname_setup_ci_filename(dir, iname, &fname->cf_name); +#endif + return 0; } @@ -2376,7 +2403,13 @@ static inline int ext4_fname_prepare_lookup(struct inode *dir, return ext4_fname_setup_filename(dir, &dentry->d_name, 1, fname); } -static inline void ext4_fname_free_filename(struct ext4_filename *fname) { } +static inline void ext4_fname_free_filename(struct ext4_filename *fname) +{ +#ifdef CONFIG_UNICODE + kfree(fname->cf_name.name); + fname->cf_name.name = NULL; +#endif +} #endif /* !CONFIG_FS_ENCRYPTION */ /* dir.c */ @@ -3119,8 +3152,8 @@ extern int ext4_handle_dirty_dirent_node(handle_t *handle, struct inode *inode, struct buffer_head *bh); extern int ext4_ci_compare(const struct inode *parent, - const struct qstr *name, - const struct qstr *entry); + const struct qstr *fname, + const struct qstr *entry, bool quick); #define S_SHIFT 12 static const unsigned char ext4_type_by_mode[(S_IFMT >> S_SHIFT) + 1] = { diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index cd01c4a67ffb..4909ced4e672 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -1259,19 +1259,24 @@ static void dx_insert_block(struct dx_frame *frame, u32 hash, ext4_lblk_t block) #ifdef CONFIG_UNICODE /* * Test whether a case-insensitive directory entry matches the filename - * being searched for. + * being searched for. If quick is set, assume the name being looked up + * is already in the casefolded form. * * Returns: 0 if the directory entry matches, more than 0 if it * doesn't match or less than zero on error. */ int ext4_ci_compare(const struct inode *parent, const struct qstr *name, - const struct qstr *entry) + const struct qstr *entry, bool quick) { const struct ext4_sb_info *sbi = EXT4_SB(parent->i_sb); const struct unicode_map *um = sbi->s_encoding; int ret; - ret = utf8_strncasecmp(um, name, entry); + if (quick) + ret = utf8_strncasecmp_folded(um, name, entry); + else + ret = utf8_strncasecmp(um, name, entry); + if (ret < 0) { /* Handle invalid character sequence as either an error * or as an opaque byte sequence. @@ -1287,6 +1292,27 @@ int ext4_ci_compare(const struct inode *parent, const struct qstr *name, return ret; } + +void ext4_fname_setup_ci_filename(struct inode *dir, const struct qstr *iname, + struct fscrypt_str *cf_name) +{ + if (!IS_CASEFOLDED(dir)) { + cf_name->name = NULL; + return; + } + + cf_name->name = kmalloc(EXT4_NAME_LEN, GFP_NOFS); + if (!cf_name->name) + return; + + cf_name->len = utf8_casefold(EXT4_SB(dir->i_sb)->s_encoding, + iname, cf_name->name, + EXT4_NAME_LEN); + if (cf_name->len <= 0) { + kfree(cf_name->name); + cf_name->name = NULL; + } +} #endif /* @@ -1313,8 +1339,15 @@ static inline bool ext4_match(const struct inode *parent, #endif #ifdef CONFIG_UNICODE - if (EXT4_SB(parent->i_sb)->s_encoding && IS_CASEFOLDED(parent)) - return (ext4_ci_compare(parent, fname->usr_fname, &entry) == 0); + if (EXT4_SB(parent->i_sb)->s_encoding && IS_CASEFOLDED(parent)) { + if (fname->cf_name.name) { + struct qstr cf = {.name = fname->cf_name.name, + .len = fname->cf_name.len}; + return !ext4_ci_compare(parent, &cf, &entry, true); + } + return !ext4_ci_compare(parent, fname->usr_fname, &entry, + false); + } #endif return fscrypt_match_name(&f, de->name, de->name_len); diff --git a/fs/unicode/utf8-core.c b/fs/unicode/utf8-core.c index 6afab4fdce90..71ca4d047d65 100644 --- a/fs/unicode/utf8-core.c +++ b/fs/unicode/utf8-core.c @@ -73,6 +73,34 @@ int utf8_strncasecmp(const struct unicode_map *um, } EXPORT_SYMBOL(utf8_strncasecmp); +/* String cf is expected to be a valid UTF-8 casefolded + * string. + */ +int utf8_strncasecmp_folded(const struct unicode_map *um, + const struct qstr *cf, + const struct qstr *s1) +{ + const struct utf8data *data = utf8nfdicf(um->version); + struct utf8cursor cur1; + int c1, c2; + int i = 0; + + if (utf8ncursor(&cur1, data, s1->name, s1->len) < 0) + return -EINVAL; + + do { + c1 = utf8byte(&cur1); + c2 = cf->name[i++]; + if (c1 < 0) + return -EINVAL; + if (c1 != c2) + return 1; + } while (c1); + + return 0; +} +EXPORT_SYMBOL(utf8_strncasecmp_folded); + int utf8_casefold(const struct unicode_map *um, const struct qstr *str, unsigned char *dest, size_t dlen) { diff --git a/include/linux/unicode.h b/include/linux/unicode.h index aec2c6d800aa..990aa97d8049 100644 --- a/include/linux/unicode.h +++ b/include/linux/unicode.h @@ -17,6 +17,9 @@ int utf8_strncmp(const struct unicode_map *um, int utf8_strncasecmp(const struct unicode_map *um, const struct qstr *s1, const struct qstr *s2); +int utf8_strncasecmp_folded(const struct unicode_map *um, + const struct qstr *cf, + const struct qstr *s1); int utf8_normalize(const struct unicode_map *um, const struct qstr *str, unsigned char *dest, size_t dlen); -- 2.20.1