LinuxLists.cc - [PATCH v10 1/4] unicode: Add utf8_casefold

2020-07-07 11:46:40

Subject: [PATCH v10 1/4] unicode: Add utf8_casefold_hash

This adds a case insensitive hash function to allow taking the hash
without needing to allocate a casefolded copy of the string.

The existing d_hash implementations for casefolding allocates memory
within rcu-walk, by avoiding it we can be more efficient and avoid
worrying about a failed allocation.

Signed-off-by: Daniel Rosenberg <[email protected]>
---
fs/unicode/utf8-core.c | 23 ++++++++++++++++++++++-
include/linux/unicode.h | 3 +++
2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/fs/unicode/utf8-core.c b/fs/unicode/utf8-core.c
index 2a878b739115..dc25823bfed9 100644
--- a/fs/unicode/utf8-core.c
+++ b/fs/unicode/utf8-core.c
@@ -6,6 +6,7 @@
#include <linux/parser.h>
#include <linux/errno.h>
#include <linux/unicode.h>
+#include <linux/stringhash.h>

#include "utf8n.h"

@@ -122,9 +123,29 @@ int utf8_casefold(const struct unicode_map *um, const struct qstr *str,
}
return -EINVAL;
}
-
EXPORT_SYMBOL(utf8_casefold);

+int utf8_casefold_hash(const struct unicode_map *um, const void *salt,
+ struct qstr *str)
+{
+ const struct utf8data *data = utf8nfdicf(um->version);
+ struct utf8cursor cur;
+ int c;
+ unsigned long hash = init_name_hash(salt);
+
+ if (utf8ncursor(&cur, data, str->name, str->len) < 0)
+ return -EINVAL;
+
+ while ((c = utf8byte(&cur))) {
+ if (c < 0)
+ return -EINVAL;
+ hash = partial_name_hash((unsigned char)c, hash);
+ }
+ str->hash = end_name_hash(hash);
+ return 0;
+}
+EXPORT_SYMBOL(utf8_casefold_hash);
+
int utf8_normalize(const struct unicode_map *um, const struct qstr *str,
unsigned char *dest, size_t dlen)
{
diff --git a/include/linux/unicode.h b/include/linux/unicode.h
index 990aa97d8049..74484d44c755 100644
--- a/include/linux/unicode.h
+++ b/include/linux/unicode.h
@@ -27,6 +27,9 @@ int utf8_normalize(const struct unicode_map *um, const struct qstr *str,
int utf8_casefold(const struct unicode_map *um, const struct qstr *str,
unsigned char *dest, size_t dlen);

+int utf8_casefold_hash(const struct unicode_map *um, const void *salt,
+ struct qstr *str);
+
struct unicode_map *utf8_load(const char *version);
void utf8_unload(struct unicode_map *um);

--
2.27.0.212.ge8ba1cc988-goog

2020-07-07 17:51:26

by Gabriel Krisman Bertazi

[permalink] [raw]

Subject: Re: [PATCH v10 1/4] unicode: Add utf8_casefold_hash

Daniel Rosenberg <[email protected]> writes:

> This adds a case insensitive hash function to allow taking the hash
> without needing to allocate a casefolded copy of the string.
>
> The existing d_hash implementations for casefolding allocates memory
> within rcu-walk, by avoiding it we can be more efficient and avoid
> worrying about a failed allocation.
>
> Signed-off-by: Daniel Rosenberg <[email protected]>

Reviewed-by: Gabriel Krisman Bertazi <[email protected]>

--
Gabriel Krisman Bertazi

2020-07-08 01:38:05

by Eric Biggers

[permalink] [raw]

Subject: Re: [PATCH v10 1/4] unicode: Add utf8_casefold_hash

On Tue, Jul 07, 2020 at 04:31:20AM -0700, Daniel Rosenberg wrote:
> This adds a case insensitive hash function to allow taking the hash
> without needing to allocate a casefolded copy of the string.
>
> The existing d_hash implementations for casefolding allocates memory
> within rcu-walk, by avoiding it we can be more efficient and avoid
> worrying about a failed allocation.
>
> Signed-off-by: Daniel Rosenberg <[email protected]>

You can add:

Reviewed-by: Eric Biggers <[email protected]>

If you have a chance please fix the grammar in the commit message though:

"The existing d_hash implementations for casefolding allocate memory
within rcu-walk. By avoiding this we can be more efficient and avoid
worrying about a failed allocation."

- Eric