Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp3883641pxf; Mon, 29 Mar 2021 14:21:47 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz5L+ILdzUnzQl+crbIFzm9SoCUAWbuOf5qSawZmlPQAi+hgbAeE7r1cBNbetD2lKph5s7I X-Received: by 2002:aa7:cd54:: with SMTP id v20mr30522421edw.80.1617052907225; Mon, 29 Mar 2021 14:21:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1617052907; cv=none; d=google.com; s=arc-20160816; b=e3HxKHDC6arI7bvqaJJL3ak0NFpC3Zw02CHiF8NZr1OzxUnGM0oKuzd37++0uoB7Fi klLVL7KXbNNAgGA6L+MkD/YFUoF0a7G582y+JEHblwjuBYtmZOG7FWgfesDCf23PuQft X3F2GH65Yp8/29Ntt6C2/hRLdQuX1/j3cF+BsZkp9M/IQXzqkVDYC0OLJ5I+/2ujE6pn 1QkPajFEICuwTRB/qzW7xKpQi7ZEVlIrRz1fBeGuvQeXwl6Da1kQSVnRGSjI0M5MxjLc 3G7he0HZRXroXqm9OAGOex0iRL4hS5S0v/A6h5ATUpABDzBlUOCMcNiCmVkhZ1bFZpQ6 Pj9Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent:message-id:in-reply-to :date:references:organization:subject:cc:to:from; bh=x4nIiEjXmFrU7GMS+qD1ZuhvLp4bLzCC3UFT6fMbGmQ=; b=VycN4zsnb2OINGNJNXBgk+GNWOe/NbMrIWbKurJFAcPX4Wr83vlQOUBqCbBqCYARqd ku/5vqFHb341r7Gg7xXHZXupeA+CZL4wb3HnxBHAR28ouDnw8asG1wGLwX2AZBch8NAI O0DcSeUwrnse/kPvWoyMpl4h1E/+WopoGxHK2rTHfLSdfaHOJFBUXWnEn4cNROoW/abx J8GwNd3Tvh2m1/mShy5VIiBYnKHagkrjxnbRF0jGcfCT+txr4W6NQIx9oC6mpwIx9/v9 aGeKCdGVEivcC6D6wQGEsE7Yw3g5TIPSfMpN81lZOJJ7AgU2az0K4bBfYyUqGlzJYMS2 fspw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=collabora.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id n1si6499edv.538.2021.03.29.14.21.24; Mon, 29 Mar 2021 14:21:47 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=collabora.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230280AbhC2VU0 (ORCPT + 99 others); Mon, 29 Mar 2021 17:20:26 -0400 Received: from bhuna.collabora.co.uk ([46.235.227.227]:54788 "EHLO bhuna.collabora.co.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230161AbhC2VUX (ORCPT ); Mon, 29 Mar 2021 17:20:23 -0400 Received: from [127.0.0.1] (localhost [127.0.0.1]) (Authenticated sender: krisman) with ESMTPSA id AED141F458BB From: Gabriel Krisman Bertazi To: Shreeya Patel Cc: tytso@mit.edu, adilger.kernel@dilger.ca, jaegeuk@kernel.org, chao@kernel.org, ebiggers@google.com, drosen@google.com, ebiggers@kernel.org, yuchao0@huawei.com, linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, kernel@collabora.com, andre.almeida@collabora.com Subject: Re: [PATCH v5 4/4] fs: unicode: Add utf8 module and a unicode layer Organization: Collabora References: <20210329204240.359184-1-shreeya.patel@collabora.com> <20210329204240.359184-5-shreeya.patel@collabora.com> Date: Mon, 29 Mar 2021 17:20:18 -0400 In-Reply-To: <20210329204240.359184-5-shreeya.patel@collabora.com> (Shreeya Patel's message of "Tue, 30 Mar 2021 02:12:40 +0530") Message-ID: <87o8f1r71p.fsf@collabora.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Shreeya Patel writes: > utf8data.h_shipped has a large database table which is an auto-generated > decodification trie for the unicode normalization functions. > It is not necessary to load this large table in the kernel if no > filesystem is using it, hence make UTF-8 encoding loadable by converting > it into a module. > Modify the file called unicode-core which will act as a layer for > unicode subsystem. It will load the UTF-8 module and access it's functions > whenever any filesystem that needs unicode is mounted. > Also, indirect calls using function pointers are slow, use static calls to > avoid overhead caused in case of repeated indirect calls. Static calls > improves the performance by directly calling the functions as opposed to > indirect calls. > > Signed-off-by: Shreeya Patel > --- > Changes in v5 > - Rename global variables and default static call functions for better > understanding > - Make only config UNICODE_UTF8 visible and config UNICODE to be always > enabled provided UNICODE_UTF8 is enabled. > - Improve the documentation for Kconfig > - Improve the commit message. > > Changes in v4 > - Return error from the static calls instead of doing nothing and > succeeding even without loading the module. > - Remove the complete usage of utf8_ops and use static calls at all > places. > - Restore the static calls to default values when module is unloaded. > - Decrement the reference of module after calling the unload function. > - Remove spinlock as there will be no race conditions after removing > utf8_ops. > > Changes in v3 > - Add a patch which checks if utf8 is loaded before calling utf8_unload() > in ext4 and f2fs filesystems > - Return error if strscpy() returns value < 0 > - Correct the conditions to prevent NULL pointer dereference while > accessing functions via utf8_ops variable. > - Add spinlock to avoid race conditions. > - Use static_call() for preventing speculative execution attacks. > > Changes in v2 > - Remove the duplicate file from the last patch. > - Make the wrapper functions inline. > - Remove msleep and use try_module_get() and module_put() > for ensuring that module is loaded correctly and also > doesn't get unloaded while in use. > - Resolve the warning reported by kernel test robot. > - Resolve all the checkpatch.pl warnings. > > > fs/unicode/Kconfig | 17 ++- > fs/unicode/Makefile | 5 +- > fs/unicode/unicode-core.c | 241 +++++++---------------------------- > fs/unicode/unicode-utf8.c | 256 ++++++++++++++++++++++++++++++++++++++ > include/linux/unicode.h | 123 +++++++++++++++--- > 5 files changed, 426 insertions(+), 216 deletions(-) > create mode 100644 fs/unicode/unicode-utf8.c > > diff --git a/fs/unicode/Kconfig b/fs/unicode/Kconfig > index 2c27b9a5cd6c..ad4b837f2eb2 100644 > --- a/fs/unicode/Kconfig > +++ b/fs/unicode/Kconfig > @@ -2,13 +2,26 @@ > # > # UTF-8 normalization > # > +# CONFIG_UNICODE will be automatically enabled if CONFIG_UNICODE_UTF8 > +# is enabled. This config option adds the unicode subsystem layer which loads > +# the UTF-8 module whenever any filesystem needs it. > config UNICODE > - bool "UTF-8 normalization and casefolding support" > + bool > + > +# utf8data.h_shipped has a large database table which is an auto-generated > +# decodification trie for the unicode normalization functions and it is not > +# necessary to carry this large table in the kernel. > +# Enabling UNICODE_UTF8 option will allow UTF-8 encoding to be built as a > +# module and this module will be loaded by the unicode subsystem layer only > +# when any filesystem needs it. > +config UNICODE_UTF8 > + tristate "UTF-8 module" > help > Say Y here to enable UTF-8 NFD normalization and NFD+CF casefolding > support. > + select UNICODE > > config UNICODE_NORMALIZATION_SELFTEST > tristate "Test UTF-8 normalization support" > - depends on UNICODE > + depends on UNICODE_UTF8 > default n > diff --git a/fs/unicode/Makefile b/fs/unicode/Makefile > index fbf9a629ed0d..49d50083e6ee 100644 > --- a/fs/unicode/Makefile > +++ b/fs/unicode/Makefile > @@ -1,11 +1,14 @@ > # SPDX-License-Identifier: GPL-2.0 > > obj-$(CONFIG_UNICODE) += unicode.o > +obj-$(CONFIG_UNICODE_UTF8) += utf8.o > obj-$(CONFIG_UNICODE_NORMALIZATION_SELFTEST) += utf8-selftest.o > > -unicode-y := utf8-norm.o unicode-core.o > +unicode-y := unicode-core.o > +utf8-y := unicode-utf8.o utf8-norm.o > > $(obj)/utf8-norm.o: $(obj)/utf8data.h > +$(obj)/unicode-utf8.o: $(obj)/utf8-norm.o > > # In the normal build, the checked-in utf8data.h is just shipped. > # > diff --git a/fs/unicode/unicode-core.c b/fs/unicode/unicode-core.c > index 730dbaedf593..07d42f471e42 100644 > --- a/fs/unicode/unicode-core.c > +++ b/fs/unicode/unicode-core.c > @@ -1,237 +1,80 @@ > /* SPDX-License-Identifier: GPL-2.0 */ > #include > #include > -#include > #include > -#include > #include > #include > -#include > > -#include "utf8n.h" > +static struct module *utf8mod; > > -int unicode_validate(const struct unicode_map *um, const struct qstr *str) > -{ > - const struct utf8data *data = utf8nfdi(um->version); > +DEFINE_STATIC_CALL(_unicode_validate, unicode_validate_default); > +EXPORT_STATIC_CALL(_unicode_validate); > > - if (utf8nlen(data, str->name, str->len) < 0) > - return -1; > - return 0; > -} > -EXPORT_SYMBOL(unicode_validate); > +DEFINE_STATIC_CALL(_unicode_strncmp, unicode_strncmp_default); > +EXPORT_STATIC_CALL(_unicode_strncmp); > > -int unicode_strncmp(const struct unicode_map *um, > - const struct qstr *s1, const struct qstr *s2) > -{ > - const struct utf8data *data = utf8nfdi(um->version); > - struct utf8cursor cur1, cur2; > - int c1, c2; > +DEFINE_STATIC_CALL(_unicode_strncasecmp, unicode_strncasecmp_default); > +EXPORT_STATIC_CALL(_unicode_strncasecmp); Why are these here if the _default functions are defined in the header file? I think the definitions could be in this file. No? > - if (utf8ncursor(&cur1, data, s1->name, s1->len) < 0) > - return -EINVAL; > +DEFINE_STATIC_CALL(_unicode_strncasecmp_folded, unicode_strncasecmp_folded_default); > +EXPORT_STATIC_CALL(_unicode_strncasecmp_folded); > > - if (utf8ncursor(&cur2, data, s2->name, s2->len) < 0) > - return -EINVAL; > +DEFINE_STATIC_CALL(_unicode_normalize, unicode_normalize_default); > +EXPORT_STATIC_CALL(_unicode_normalize); > > - do { > - c1 = utf8byte(&cur1); > - c2 = utf8byte(&cur2); > +DEFINE_STATIC_CALL(_unicode_casefold, unicode_casefold_default); > +EXPORT_STATIC_CALL(_unicode_casefold); > > - if (c1 < 0 || c2 < 0) > - return -EINVAL; > - if (c1 != c2) > - return 1; > - } while (c1); > +DEFINE_STATIC_CALL(_unicode_casefold_hash, unicode_casefold_hash_default); > +EXPORT_STATIC_CALL(_unicode_casefold_hash); > > - return 0; > -} > -EXPORT_SYMBOL(unicode_strncmp); > +DEFINE_STATIC_CALL(_unicode_load, unicode_load_default); > +EXPORT_STATIC_CALL(_unicode_load); > > -int unicode_strncasecmp(const struct unicode_map *um, > - const struct qstr *s1, const struct qstr *s2) > +static int unicode_load_module(void) > { > - const struct utf8data *data = utf8nfdicf(um->version); > - struct utf8cursor cur1, cur2; > - int c1, c2; > - > - if (utf8ncursor(&cur1, data, s1->name, s1->len) < 0) > - return -EINVAL; > - > - if (utf8ncursor(&cur2, data, s2->name, s2->len) < 0) > - return -EINVAL; > - > - do { > - c1 = utf8byte(&cur1); > - c2 = utf8byte(&cur2); > - > - if (c1 < 0 || c2 < 0) > - return -EINVAL; > - if (c1 != c2) > - return 1; > - } while (c1); > - > - return 0; > -} > -EXPORT_SYMBOL(unicode_strncasecmp); > - > -/* String cf is expected to be a valid UTF-8 casefolded > - * string. > - */ > -int unicode_strncasecmp_folded(const struct unicode_map *um, > - const struct qstr *cf, > - const struct qstr *s1) > -{ > - const struct utf8data *data = utf8nfdicf(um->version); > - struct utf8cursor cur1; > - int c1, c2; > - int i = 0; > - > - if (utf8ncursor(&cur1, data, s1->name, s1->len) < 0) > - return -EINVAL; > - > - do { > - c1 = utf8byte(&cur1); > - c2 = cf->name[i++]; > - if (c1 < 0) > - return -EINVAL; > - if (c1 != c2) > - return 1; > - } while (c1); > - > - return 0; > -} > -EXPORT_SYMBOL(unicode_strncasecmp_folded); > - > -int unicode_casefold(const struct unicode_map *um, const struct qstr *str, > - unsigned char *dest, size_t dlen) > -{ > - const struct utf8data *data = utf8nfdicf(um->version); > - struct utf8cursor cur; > - size_t nlen = 0; > - > - if (utf8ncursor(&cur, data, str->name, str->len) < 0) > - return -EINVAL; > - > - for (nlen = 0; nlen < dlen; nlen++) { > - int c = utf8byte(&cur); > - > - dest[nlen] = c; > - if (!c) > - return nlen; > - if (c == -1) > - break; > - } > - return -EINVAL; > -} > -EXPORT_SYMBOL(unicode_casefold); > + int ret = request_module("utf8"); > > -int unicode_casefold_hash(const struct unicode_map *um, const void *salt, > - struct qstr *str) > -{ > - const struct utf8data *data = utf8nfdicf(um->version); > - struct utf8cursor cur; > - int c; > - unsigned long hash = init_name_hash(salt); > - > - if (utf8ncursor(&cur, data, str->name, str->len) < 0) > - return -EINVAL; > - > - while ((c = utf8byte(&cur))) { > - if (c < 0) > - return -EINVAL; > - hash = partial_name_hash((unsigned char)c, hash); > + if (ret) { > + pr_err("Failed to load UTF-8 module\n"); > + return ret; > } > - str->hash = end_name_hash(hash); > return 0; > } > -EXPORT_SYMBOL(unicode_casefold_hash); > > -int unicode_normalize(const struct unicode_map *um, const struct qstr *str, > - unsigned char *dest, size_t dlen) > +struct unicode_map *unicode_load(const char *version) > { > - const struct utf8data *data = utf8nfdi(um->version); > - struct utf8cursor cur; > - ssize_t nlen = 0; > - > - if (utf8ncursor(&cur, data, str->name, str->len) < 0) > - return -EINVAL; > + int ret = unicode_load_module(); Splitting this in two functions sound unnecessary, since the other function just calls request_module. By the way, is there any protection against calling request_module if the module is already loaded? Surely that's not necessary, perhaps try_then_request_module(utf8mod, "utf8")? > - for (nlen = 0; nlen < dlen; nlen++) { > - int c = utf8byte(&cur); > + if (ret) > + return ERR_PTR(ret); > > - dest[nlen] = c; > - if (!c) > - return nlen; > - if (c == -1) > - break; > - } > - return -EINVAL; > + if (!try_module_get(utf8mod)) Can't module_unregister be called in between the register_module and here, and then you have a bogus utf8mod pointer? true, try_module_get checks for NULL, but if you are unlucky module_is_live will it breaks. I still think utf8mod needs to be protected while you don't have a reference to the module. > + return ERR_PTR(-ENODEV); > + else > + return static_call(_unicode_load)(version); > } > -EXPORT_SYMBOL(unicode_normalize); > +EXPORT_SYMBOL(unicode_load); > > -static int unicode_parse_version(const char *version, unsigned int *maj, > - unsigned int *min, unsigned int *rev) > +void unicode_unload(struct unicode_map *um) > { > - substring_t args[3]; > - char version_string[12]; > - static const struct match_token token[] = { > - {1, "%d.%d.%d"}, > - {0, NULL} > - }; > - int ret = strscpy(version_string, version, sizeof(version_string)); > - > - if (ret < 0) > - return ret; > - > - if (match_token(version_string, token, args) != 1) > - return -EINVAL; > - > - if (match_int(&args[0], maj) || match_int(&args[1], min) || > - match_int(&args[2], rev)) > - return -EINVAL; > + kfree(um); > > - return 0; > + if (utf8mod) > + module_put(utf8mod); > } > +EXPORT_SYMBOL(unicode_unload); > > -struct unicode_map *unicode_load(const char *version) > +void unicode_register(struct module *owner) > { > - struct unicode_map *um = NULL; > - int unicode_version; > - > - if (version) { > - unsigned int maj, min, rev; > - > - if (unicode_parse_version(version, &maj, &min, &rev) < 0) > - return ERR_PTR(-EINVAL); > - > - if (!utf8version_is_supported(maj, min, rev)) > - return ERR_PTR(-EINVAL); > - > - unicode_version = UNICODE_AGE(maj, min, rev); > - } else { > - unicode_version = utf8version_latest(); > - printk(KERN_WARNING"UTF-8 version not specified. " > - "Assuming latest supported version (%d.%d.%d).", > - (unicode_version >> 16) & 0xff, > - (unicode_version >> 8) & 0xff, > - (unicode_version & 0xff)); > - } > - > - um = kzalloc(sizeof(struct unicode_map), GFP_KERNEL); > - if (!um) > - return ERR_PTR(-ENOMEM); > - > - um->charset = "UTF-8"; > - um->version = unicode_version; > - > - return um; > + utf8mod = owner; > } > -EXPORT_SYMBOL(unicode_load); > +EXPORT_SYMBOL(unicode_register); > > -void unicode_unload(struct unicode_map *um) > +void unicode_unregister(void) > { > - kfree(um); > + utf8mod = NULL; > } > -EXPORT_SYMBOL(unicode_unload); > +EXPORT_SYMBOL(unicode_unregister); > > MODULE_LICENSE("GPL v2"); > diff --git a/fs/unicode/unicode-utf8.c b/fs/unicode/unicode-utf8.c > new file mode 100644 > index 000000000000..9c6b58239067 > --- /dev/null > +++ b/fs/unicode/unicode-utf8.c > @@ -0,0 +1,256 @@ > +// SPDX-License-Identifier: GPL-2.0 > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include "utf8n.h" > + > +static int utf8_validate(const struct unicode_map *um, const struct qstr *str) > +{ > + const struct utf8data *data = utf8nfdi(um->version); > + > + if (utf8nlen(data, str->name, str->len) < 0) > + return -1; > + return 0; > +} > + > +static int utf8_strncmp(const struct unicode_map *um, > + const struct qstr *s1, const struct qstr *s2) > +{ > + const struct utf8data *data = utf8nfdi(um->version); > + struct utf8cursor cur1, cur2; > + int c1, c2; > + > + if (utf8ncursor(&cur1, data, s1->name, s1->len) < 0) > + return -EINVAL; > + > + if (utf8ncursor(&cur2, data, s2->name, s2->len) < 0) > + return -EINVAL; > + > + do { > + c1 = utf8byte(&cur1); > + c2 = utf8byte(&cur2); > + > + if (c1 < 0 || c2 < 0) > + return -EINVAL; > + if (c1 != c2) > + return 1; > + } while (c1); > + > + return 0; > +} > + > +static int utf8_strncasecmp(const struct unicode_map *um, > + const struct qstr *s1, const struct qstr *s2) > +{ > + const struct utf8data *data = utf8nfdicf(um->version); > + struct utf8cursor cur1, cur2; > + int c1, c2; > + > + if (utf8ncursor(&cur1, data, s1->name, s1->len) < 0) > + return -EINVAL; > + > + if (utf8ncursor(&cur2, data, s2->name, s2->len) < 0) > + return -EINVAL; > + > + do { > + c1 = utf8byte(&cur1); > + c2 = utf8byte(&cur2); > + > + if (c1 < 0 || c2 < 0) > + return -EINVAL; > + if (c1 != c2) > + return 1; > + } while (c1); > + > + return 0; > +} > + > +/* String cf is expected to be a valid UTF-8 casefolded > + * string. > + */ > +static int utf8_strncasecmp_folded(const struct unicode_map *um, > + const struct qstr *cf, > + const struct qstr *s1) > +{ > + const struct utf8data *data = utf8nfdicf(um->version); > + struct utf8cursor cur1; > + int c1, c2; > + int i = 0; > + > + if (utf8ncursor(&cur1, data, s1->name, s1->len) < 0) > + return -EINVAL; > + > + do { > + c1 = utf8byte(&cur1); > + c2 = cf->name[i++]; > + if (c1 < 0) > + return -EINVAL; > + if (c1 != c2) > + return 1; > + } while (c1); > + > + return 0; > +} > + > +static int utf8_casefold(const struct unicode_map *um, const struct qstr *str, > + unsigned char *dest, size_t dlen) > +{ > + const struct utf8data *data = utf8nfdicf(um->version); > + struct utf8cursor cur; > + size_t nlen = 0; > + > + if (utf8ncursor(&cur, data, str->name, str->len) < 0) > + return -EINVAL; > + > + for (nlen = 0; nlen < dlen; nlen++) { > + int c = utf8byte(&cur); > + > + dest[nlen] = c; > + if (!c) > + return nlen; > + if (c == -1) > + break; > + } > + return -EINVAL; > +} > + > +static int utf8_casefold_hash(const struct unicode_map *um, const void *salt, > + struct qstr *str) > +{ > + const struct utf8data *data = utf8nfdicf(um->version); > + struct utf8cursor cur; > + int c; > + unsigned long hash = init_name_hash(salt); > + > + if (utf8ncursor(&cur, data, str->name, str->len) < 0) > + return -EINVAL; > + > + while ((c = utf8byte(&cur))) { > + if (c < 0) > + return -EINVAL; > + hash = partial_name_hash((unsigned char)c, hash); > + } > + str->hash = end_name_hash(hash); > + return 0; > +} > + > +static int utf8_normalize(const struct unicode_map *um, const struct qstr *str, > + unsigned char *dest, size_t dlen) > +{ > + const struct utf8data *data = utf8nfdi(um->version); > + struct utf8cursor cur; > + ssize_t nlen = 0; > + > + if (utf8ncursor(&cur, data, str->name, str->len) < 0) > + return -EINVAL; > + > + for (nlen = 0; nlen < dlen; nlen++) { > + int c = utf8byte(&cur); > + > + dest[nlen] = c; > + if (!c) > + return nlen; > + if (c == -1) > + break; > + } > + return -EINVAL; > +} > + > +static int utf8_parse_version(const char *version, unsigned int *maj, > + unsigned int *min, unsigned int *rev) > +{ > + substring_t args[3]; > + char version_string[12]; > + static const struct match_token token[] = { > + {1, "%d.%d.%d"}, > + {0, NULL} > + }; > + > + int ret = strscpy(version_string, version, sizeof(version_string)); > + > + if (ret < 0) > + return ret; > + > + if (match_token(version_string, token, args) != 1) > + return -EINVAL; > + > + if (match_int(&args[0], maj) || match_int(&args[1], min) || > + match_int(&args[2], rev)) > + return -EINVAL; > + > + return 0; > +} > + > +static struct unicode_map *utf8_load(const char *version) > +{ > + struct unicode_map *um = NULL; > + int unicode_version; > + > + if (version) { > + unsigned int maj, min, rev; > + > + if (utf8_parse_version(version, &maj, &min, &rev) < 0) > + return ERR_PTR(-EINVAL); > + > + if (!utf8version_is_supported(maj, min, rev)) > + return ERR_PTR(-EINVAL); > + > + unicode_version = UNICODE_AGE(maj, min, rev); > + } else { > + unicode_version = utf8version_latest(); > + pr_warn("UTF-8 version not specified. Assuming latest supported version (%d.%d.%d).", > + (unicode_version >> 16) & 0xff, > + (unicode_version >> 8) & 0xff, > + (unicode_version & 0xfe)); > + } > + > + um = kzalloc(sizeof(*um), GFP_KERNEL); > + if (!um) > + return ERR_PTR(-ENOMEM); > + > + um->charset = "UTF-8"; > + um->version = unicode_version; > + > + return um; > +} > + > +static int __init utf8_init(void) > +{ > + static_call_update(_unicode_validate, utf8_validate); > + static_call_update(_unicode_strncmp, utf8_strncmp); > + static_call_update(_unicode_strncasecmp, utf8_strncasecmp); > + static_call_update(_unicode_strncasecmp_folded, utf8_strncasecmp_folded); > + static_call_update(_unicode_normalize, utf8_normalize); > + static_call_update(_unicode_casefold, utf8_casefold); > + static_call_update(_unicode_casefold_hash, utf8_casefold_hash); > + static_call_update(_unicode_load, utf8_load); > + > + unicode_register(THIS_MODULE); > + return 0; > +} > + > +static void __exit utf8_exit(void) > +{ > + static_call_update(_unicode_validate, unicode_validate_default); > + static_call_update(_unicode_strncmp, unicode_strncmp_default); > + static_call_update(_unicode_strncasecmp, unicode_strncasecmp_default); > + static_call_update(_unicode_strncasecmp_folded, unicode_strncasecmp_folded_default); > + static_call_update(_unicode_normalize, unicode_normalize_default); > + static_call_update(_unicode_casefold, unicode_casefold_default); > + static_call_update(_unicode_casefold_hash, unicode_casefold_hash_default); > + static_call_update(_unicode_load, unicode_load_default); > + > + unicode_unregister(); > +} > + > +module_init(utf8_init); > +module_exit(utf8_exit); > + > +MODULE_LICENSE("GPL v2"); > diff --git a/include/linux/unicode.h b/include/linux/unicode.h > index de23f9ee720b..18a1d3db9de5 100644 > --- a/include/linux/unicode.h > +++ b/include/linux/unicode.h > @@ -4,33 +4,128 @@ > > #include > #include > +#include > + > > struct unicode_map { > const char *charset; > int version; > }; > > -int unicode_validate(const struct unicode_map *um, const struct qstr *str); > +static int unicode_warn_on(void) > +{ > + WARN_ON(1); > + return -EIO; > +} Creating this extra function adds the same number of lines than if you write `WARN_ON(1); return -EIO;` in each of the few handlers below, but the later would be more clear, and you already do it for unicode_load_default anyway. :) > + > +static int unicode_validate_default(const struct unicode_map *um, > + const struct qstr *str) > +{ > + return unicode_warn_on(); > +} > + > +static int unicode_strncmp_default(const struct unicode_map *um, > + const struct qstr *s1, > + const struct qstr *s2) > +{ > + return unicode_warn_on(); > +} > + > +static int unicode_strncasecmp_default(const struct unicode_map *um, > + const struct qstr *s1, > + const struct qstr *s2) > +{ > + return unicode_warn_on(); > +} > + > +static int unicode_strncasecmp_folded_default(const struct unicode_map *um, > + const struct qstr *cf, > + const struct qstr *s1) > +{ > + return unicode_warn_on(); > +} > + > +static int unicode_normalize_default(const struct unicode_map *um, > + const struct qstr *str, > + unsigned char *dest, size_t dlen) > +{ > + return unicode_warn_on(); > +} > + > +static int unicode_casefold_default(const struct unicode_map *um, > + const struct qstr *str, > + unsigned char *dest, size_t dlen) > +{ > + return unicode_warn_on(); > +} > > -int unicode_strncmp(const struct unicode_map *um, > - const struct qstr *s1, const struct qstr *s2); > +static int unicode_casefold_hash_default(const struct unicode_map *um, > + const void *salt, struct qstr *str) > +{ > + return unicode_warn_on(); > +} Again, why isn't this in a .c ? Does it need to be here? > > -int unicode_strncasecmp(const struct unicode_map *um, > - const struct qstr *s1, const struct qstr *s2); > -int unicode_strncasecmp_folded(const struct unicode_map *um, > - const struct qstr *cf, > - const struct qstr *s1); > +static struct unicode_map *unicode_load_default(const char *version) > +{ > + unicode_warn_on(); > + return ERR_PTR(-EIO); > +} > > -int unicode_normalize(const struct unicode_map *um, const struct qstr *str, > - unsigned char *dest, size_t dlen); > +DECLARE_STATIC_CALL(_unicode_validate, unicode_validate_default); > +DECLARE_STATIC_CALL(_unicode_strncmp, unicode_strncmp_default); > +DECLARE_STATIC_CALL(_unicode_strncasecmp, unicode_strncasecmp_default); > +DECLARE_STATIC_CALL(_unicode_strncasecmp_folded, unicode_strncasecmp_folded_default); > +DECLARE_STATIC_CALL(_unicode_normalize, unicode_normalize_default); > +DECLARE_STATIC_CALL(_unicode_casefold, unicode_casefold_default); > +DECLARE_STATIC_CALL(_unicode_casefold_hash, unicode_casefold_hash_default); > +DECLARE_STATIC_CALL(_unicode_load, unicode_load_default); nit: I hate this functions starting with a single _ . they are not common in the rest of the kernel either. > -int unicode_casefold(const struct unicode_map *um, const struct qstr *str, > - unsigned char *dest, size_t dlen); > +static inline int unicode_validate(const struct unicode_map *um, const struct qstr *str) > +{ > + return static_call(_unicode_validate)(um, str); > +} > > -int unicode_casefold_hash(const struct unicode_map *um, const void *salt, > - struct qstr *str); > +static inline int unicode_strncmp(const struct unicode_map *um, > + const struct qstr *s1, const struct qstr *s2) > +{ > + return static_call(_unicode_strncmp)(um, s1, s2); > +} > + > +static inline int unicode_strncasecmp(const struct unicode_map *um, > + const struct qstr *s1, const struct qstr *s2) > +{ > + return static_call(_unicode_strncasecmp)(um, s1, s2); > +} > + > +static inline int unicode_strncasecmp_folded(const struct unicode_map *um, > + const struct qstr *cf, > + const struct qstr *s1) > +{ > + return static_call(_unicode_strncasecmp_folded)(um, cf, s1); > +} > + > +static inline int unicode_normalize(const struct unicode_map *um, const struct qstr *str, > + unsigned char *dest, size_t dlen) > +{ > + return static_call(_unicode_normalize)(um, str, dest, dlen); > +} > + > +static inline int unicode_casefold(const struct unicode_map *um, const struct qstr *str, > + unsigned char *dest, size_t dlen) > +{ > + return static_call(_unicode_casefold)(um, str, dest, dlen); > +} > + > +static inline int unicode_casefold_hash(const struct unicode_map *um, const void *salt, > + struct qstr *str) > +{ > + return static_call(_unicode_casefold_hash)(um, salt, str); > +} > > struct unicode_map *unicode_load(const char *version); > void unicode_unload(struct unicode_map *um); > > +void unicode_register(struct module *owner); > +void unicode_unregister(void); > + > #endif /* _LINUX_UNICODE_H */ -- Gabriel Krisman Bertazi