2020-02-12 03:38:32

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v7 1/8] unicode: Add utf8_casefold_iter

On Fri, Feb 07, 2020 at 05:35:45PM -0800, Daniel Rosenberg wrote:
> This function will allow other uses of unicode to act upon a casefolded
> string without needing to allocate their own copy of one.
>
> The actor function can return an nonzero value to exit early.
>
> Signed-off-by: Daniel Rosenberg <[email protected]>
> ---
> fs/unicode/utf8-core.c | 25 ++++++++++++++++++++++++-
> include/linux/unicode.h | 10 ++++++++++
> 2 files changed, 34 insertions(+), 1 deletion(-)
>
> diff --git a/fs/unicode/utf8-core.c b/fs/unicode/utf8-core.c
> index 2a878b739115d..db050bf59a32b 100644
> --- a/fs/unicode/utf8-core.c
> +++ b/fs/unicode/utf8-core.c
> @@ -122,9 +122,32 @@ int utf8_casefold(const struct unicode_map *um, const struct qstr *str,
> }
> return -EINVAL;
> }
> -
> EXPORT_SYMBOL(utf8_casefold);
>
> +int utf8_casefold_iter(const struct unicode_map *um, const struct qstr *str,
> + struct utf8_itr_context *ctx)
> +{
> + const struct utf8data *data = utf8nfdicf(um->version);
> + struct utf8cursor cur;
> + int c;
> + int res = 0;
> + int pos = 0;
> +
> + if (utf8ncursor(&cur, data, str->name, str->len) < 0)
> + return -EINVAL;
> +
> + while ((c = utf8byte(&cur))) {
> + if (c < 0)
> + return c;
> + res = ctx->actor(ctx, c, pos);
> + pos++;
> + if (res)
> + return res;
> + }
> + return res;
> +}
> +EXPORT_SYMBOL(utf8_casefold_iter);

Indirect function calls are expensive these days for various reasons, including
Spectre mitigations and CFI. Are you sure it's okay from a performance
perspective to make an indirect call for every byte of the pathname?

> +typedef int (*utf8_itr_actor_t)(struct utf8_itr_context *, int byte, int pos);

The byte argument probably should be 'u8', to avoid confusion about whether it's
a byte or a Unicode codepoint.

- Eric


2020-02-14 21:48:13

by Daniel Rosenberg

[permalink] [raw]
Subject: Re: [PATCH v7 1/8] unicode: Add utf8_casefold_iter

On Tue, Feb 11, 2020 at 7:38 PM Eric Biggers <[email protected]> wrote:
>
> Indirect function calls are expensive these days for various reasons, including
> Spectre mitigations and CFI. Are you sure it's okay from a performance
> perspective to make an indirect call for every byte of the pathname?
>
> > +typedef int (*utf8_itr_actor_t)(struct utf8_itr_context *, int byte, int pos);
>
> The byte argument probably should be 'u8', to avoid confusion about whether it's
> a byte or a Unicode codepoint.
>
> - Eric

Gabriel, what do you think here? I could change it to either exposing
the things necessary to do the hashing in libfs, or instead of the
general purpose iterator, just have a hash function inside of unicode
that will compute the hash given a seed value.
-Daniel

2020-02-17 19:19:39

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: Re: [PATCH v7 1/8] unicode: Add utf8_casefold_iter

Daniel Rosenberg <[email protected]> writes:

> On Tue, Feb 11, 2020 at 7:38 PM Eric Biggers <[email protected]> wrote:
>>
>> Indirect function calls are expensive these days for various reasons, including
>> Spectre mitigations and CFI. Are you sure it's okay from a performance
>> perspective to make an indirect call for every byte of the pathname?
>>
>> > +typedef int (*utf8_itr_actor_t)(struct utf8_itr_context *, int byte, int pos);
>>
>> The byte argument probably should be 'u8', to avoid confusion about whether it's
>> a byte or a Unicode codepoint.
>>

just for the record, we use int utf8byte because it can fail
error codes, but that is not the case here. It should be u8.

>
> Gabriel, what do you think here? I could change it to either exposing
> the things necessary to do the hashing in libfs, or instead of the
> general purpose iterator, just have a hash function inside of unicode
> that will compute the hash given a seed value.

Sorry for the delay, I'm away on a long vacation and intentionally
staying away from my laptop :)

Eric has a very good point, if not prohibitively, it is unnecessarily
expensive for a hot path. Why not expose utf8ncursor and utf8byte to
libfs and implement the hash in libfs?

--
Gabriel Krisman Bertazi