LinuxLists.cc - [PATCH 1/2] lib/strtox: introduce kstrtoull

2023-12-15 08:40:13

Subject: [PATCH 1/2] lib/strtox: introduce kstrtoull_suffix() helper

Just as mentioned in the comment of memparse(), the simple_stroull()
usage can lead to overflow all by itself.

Furthermore, the suffix calculation is also super overflow prone because
that some suffix like "E" itself would eat 60bits, leaving only 4 bits
available.

And that suffix "E" can also lead to confusion since it's using the same
char of hex Ox'E'.

One simple example to expose all the problem is to use memparse() on
"25E".
The correct value should be 28823037615171174400, but the suffix E makes
it super simple to overflow, resulting the incorrect value
10376293541461622784 (9E).

So here we introduce a new helper to address the problem,
kstrtoull_suffix():

- Enhance _kstrtoull()
This allow _kstrtoull() to return even if it hits an invalid char, as
long as the optional parameter @retptr is provided.

If @retptr is provided, _kstrtoull() would try its best to parse the
valid part, and leave the remaining to be handled by the caller.

If @retptr is not provided, the behavior is not altered.

- New kstrtoull_suffix() helper
This new helper utilize the new @retptr capability of _kstrtoull(),
and provides 2 new ability:

* Allow certain suffixes to be chosen
The recommended suffix list is "KkMmGgTtPp", excluding the overflow
prone "Ee". Undermost cases there is really no need to use "E" suffix
anyway.
And for those who really need that exabytes suffix, they can enable
that suffix pretty easily.

* Add overflow checks for the suffixes
If the original number string is fine, but with the extra left
shift overflow happens, then -EOVERFLOW is returned.

Cc: Andrew Morton <[email protected]>
Cc: Christophe JAILLET <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: [email protected]
Signed-off-by: Qu Wenruo <[email protected]>
---
include/linux/kstrtox.h | 7 +++
lib/kstrtox.c | 113 ++++++++++++++++++++++++++++++++++++++--
2 files changed, 115 insertions(+), 5 deletions(-)

diff --git a/include/linux/kstrtox.h b/include/linux/kstrtox.h
index 7fcf29a4e0de..12c754152c15 100644
--- a/include/linux/kstrtox.h
+++ b/include/linux/kstrtox.h
@@ -9,6 +9,13 @@
int __must_check _kstrtoul(const char *s, unsigned int base, unsigned long *res);
int __must_check _kstrtol(const char *s, unsigned int base, long *res);

+/*
+ * The default suffix list would not include "E" since it's too easy to overflow
+ * and not much real world usage.
+ */
+#define KSTRTOULL_SUFFIX_DEFAULT ("KkMmGgTtPp")
+int kstrtoull_suffix(const char *s, unsigned int base, unsigned long long *res,
+ const char *suffixes);
int __must_check kstrtoull(const char *s, unsigned int base, unsigned long long *res);
int __must_check kstrtoll(const char *s, unsigned int base, long long *res);

diff --git a/lib/kstrtox.c b/lib/kstrtox.c
index d586e6af5e5a..63831207dfdd 100644
--- a/lib/kstrtox.c
+++ b/lib/kstrtox.c
@@ -93,7 +93,8 @@ unsigned int _parse_integer(const char *s, unsigned int base, unsigned long long
return _parse_integer_limit(s, base, p, INT_MAX);
}

-static int _kstrtoull(const char *s, unsigned int base, unsigned long long *res)
+static int _kstrtoull(const char *s, unsigned int base, unsigned long long *res,
+ char **retptr)
{
unsigned long long _res;
unsigned int rv;
@@ -105,11 +106,19 @@ static int _kstrtoull(const char *s, unsigned int base, unsigned long long *res)
if (rv == 0)
return -EINVAL;
s += rv;
- if (*s == '\n')
+
+ /*
+ * If @retptr is provided, caller is responsible to detect
+ * the extra chars, otherwise we can skip one newline.
+ */
+ if (!retptr && *s == '\n')
s++;
- if (*s)
+ if (!retptr && *s)
return -EINVAL;
+
*res = _res;
+ if (retptr)
+ *retptr = (char *)s;
return 0;
}

@@ -133,10 +142,104 @@ int kstrtoull(const char *s, unsigned int base, unsigned long long *res)
{
if (s[0] == '+')
s++;
- return _kstrtoull(s, base, res);
+ return _kstrtoull(s, base, res, NULL);
}
EXPORT_SYMBOL(kstrtoull);

+/**
+ * kstrtoull_suffix - convert a string to ull with suffixes support
+ * @s: The start of the string. The string must be null-terminated, and may also
+ * include a single newline before its terminating null.
+ * @base: The number base to use. The maximum supported base is 16. If base is
+ * given as 0, then the base of the string is automatically detected with the
+ * conventional semantics - If it begins with 0x the number will be parsed as a
+ * hexadecimal (case insensitive), if it otherwise begins with 0, it will be
+ * parsed as an octal number. Otherwise it will be parsed as a decimal.
+ * @res: Where to write the result of the conversion on success.
+ * @suffixes: A string of acceptable suffixes, must be provided. Or caller
+ * should use kstrtoull() directly.
+ *
+ *
+ * Return 0 on success.
+ *
+ * Return -ERANGE on overflow or -EINVAL if invalid chars found.
+ * Return value must be checked.
+ */
+int kstrtoull_suffix(const char *s, unsigned int base, unsigned long long *res,
+ const char *suffixes)
+{
+ unsigned long long init_value;
+ unsigned long long final_value;
+ char *endptr;
+ int ret;
+
+ ret = _kstrtoull(s, base, &init_value, &endptr);
+ /* Either already overflow or no number string at all. */
+ if (ret < 0)
+ return ret;
+ final_value = init_value;
+ /* No suffixes. */
+ if (!*endptr)
+ goto done;
+
+ switch (*endptr) {
+ case 'K':
+ case 'k':
+ if (!strchr(suffixes, *endptr))
+ return -EINVAL;
+ final_value <<= 10;
+ endptr++;
+ break;
+ case 'M':
+ case 'm':
+ if (!strchr(suffixes, *endptr))
+ return -EINVAL;
+ final_value <<= 20;
+ endptr++;
+ break;
+ case 'G':
+ case 'g':
+ if (!strchr(suffixes, *endptr))
+ return -EINVAL;
+ final_value <<= 30;
+ endptr++;
+ break;
+ case 'T':
+ case 't':
+ if (!strchr(suffixes, *endptr))
+ return -EINVAL;
+ final_value <<= 40;
+ endptr++;
+ break;
+ case 'P':
+ case 'p':
+ if (!strchr(suffixes, *endptr))
+ return -EINVAL;
+ final_value <<= 50;
+ endptr++;
+ break;
+ case 'E':
+ case 'e':
+ if (!strchr(suffixes, *endptr))
+ return -EINVAL;
+ final_value <<= 60;
+ endptr++;
+ break;
+ }
+ if (*endptr == '\n')
+ endptr++;
+ if (*endptr)
+ return -EINVAL;
+
+ /* Overflow check. */
+ if (final_value < init_value)
+ return -EOVERFLOW;
+done:
+ *res = final_value;
+ return 0;
+}
+EXPORT_SYMBOL(kstrtoull_suffix);
+
/**
* kstrtoll - convert a string to a long long
* @s: The start of the string. The string must be null-terminated, and may also
@@ -159,7 +262,7 @@ int kstrtoll(const char *s, unsigned int base, long long *res)
int rv;

if (s[0] == '-') {
- rv = _kstrtoull(s + 1, base, &tmp);
+ rv = _kstrtoull(s + 1, base, &tmp, NULL);
if (rv < 0)
return rv;
if ((long long)-tmp > 0)
--
2.43.0

2023-12-18 13:12:02

by David Disseldorp

[permalink] [raw]

Subject: Re: [PATCH 1/2] lib/strtox: introduce kstrtoull_suffix() helper

Hi Qu,

On Fri, 15 Dec 2023 19:09:23 +1030, Qu Wenruo wrote:

> Just as mentioned in the comment of memparse(), the simple_stroull()
> usage can lead to overflow all by itself.
>
> Furthermore, the suffix calculation is also super overflow prone because
> that some suffix like "E" itself would eat 60bits, leaving only 4 bits
> available.
>
> And that suffix "E" can also lead to confusion since it's using the same
> char of hex Ox'E'.
>
> One simple example to expose all the problem is to use memparse() on
> "25E".
> The correct value should be 28823037615171174400, but the suffix E makes
> it super simple to overflow, resulting the incorrect value
> 10376293541461622784 (9E).
>
> So here we introduce a new helper to address the problem,
> kstrtoull_suffix():
>
> - Enhance _kstrtoull()
> This allow _kstrtoull() to return even if it hits an invalid char, as
> long as the optional parameter @retptr is provided.
>
> If @retptr is provided, _kstrtoull() would try its best to parse the
> valid part, and leave the remaining to be handled by the caller.
>
> If @retptr is not provided, the behavior is not altered.
>
> - New kstrtoull_suffix() helper
> This new helper utilize the new @retptr capability of _kstrtoull(),
> and provides 2 new ability:
>
> * Allow certain suffixes to be chosen
> The recommended suffix list is "KkMmGgTtPp", excluding the overflow
> prone "Ee". Undermost cases there is really no need to use "E" suffix
> anyway.
> And for those who really need that exabytes suffix, they can enable
> that suffix pretty easily.
>
> * Add overflow checks for the suffixes
> If the original number string is fine, but with the extra left
> shift overflow happens, then -EOVERFLOW is returned.
>
> Cc: Andrew Morton <[email protected]>
> Cc: Christophe JAILLET <[email protected]>
> Cc: Andy Shevchenko <[email protected]>
> Cc: [email protected]
> Signed-off-by: Qu Wenruo <[email protected]>
> ---
> include/linux/kstrtox.h | 7 +++
> lib/kstrtox.c | 113 ++++++++++++++++++++++++++++++++++++++--
> 2 files changed, 115 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/kstrtox.h b/include/linux/kstrtox.h
> index 7fcf29a4e0de..12c754152c15 100644
> --- a/include/linux/kstrtox.h
> +++ b/include/linux/kstrtox.h
> @@ -9,6 +9,13 @@
> int __must_check _kstrtoul(const char *s, unsigned int base, unsigned long *res);
> int __must_check _kstrtol(const char *s, unsigned int base, long *res);
>
> +/*
> + * The default suffix list would not include "E" since it's too easy to overflow
> + * and not much real world usage.
> + */
> +#define KSTRTOULL_SUFFIX_DEFAULT ("KkMmGgTtPp")
> +int kstrtoull_suffix(const char *s, unsigned int base, unsigned long long *res,
> + const char *suffixes);
> int __must_check kstrtoull(const char *s, unsigned int base, unsigned long long *res);
> int __must_check kstrtoll(const char *s, unsigned int base, long long *res);
>
> diff --git a/lib/kstrtox.c b/lib/kstrtox.c
> index d586e6af5e5a..63831207dfdd 100644
> --- a/lib/kstrtox.c
> +++ b/lib/kstrtox.c
> @@ -93,7 +93,8 @@ unsigned int _parse_integer(const char *s, unsigned int base, unsigned long long
> return _parse_integer_limit(s, base, p, INT_MAX);
> }
>
> -static int _kstrtoull(const char *s, unsigned int base, unsigned long long *res)
> +static int _kstrtoull(const char *s, unsigned int base, unsigned long long *res,
> + char **retptr)
> {
> unsigned long long _res;
> unsigned int rv;
> @@ -105,11 +106,19 @@ static int _kstrtoull(const char *s, unsigned int base, unsigned long long *res)
> if (rv == 0)
> return -EINVAL;
> s += rv;
> - if (*s == '\n')
> +
> + /*
> + * If @retptr is provided, caller is responsible to detect
> + * the extra chars, otherwise we can skip one newline.
> + */
> + if (!retptr && *s == '\n')
> s++;
> - if (*s)
> + if (!retptr && *s)
> return -EINVAL;
> +
> *res = _res;
> + if (retptr)
> + *retptr = (char *)s;
> return 0;
> }
>
> @@ -133,10 +142,104 @@ int kstrtoull(const char *s, unsigned int base, unsigned long long *res)
> {
> if (s[0] == '+')
> s++;
> - return _kstrtoull(s, base, res);
> + return _kstrtoull(s, base, res, NULL);
> }
> EXPORT_SYMBOL(kstrtoull);
>
> +/**
> + * kstrtoull_suffix - convert a string to ull with suffixes support
> + * @s: The start of the string. The string must be null-terminated, and may also
> + * include a single newline before its terminating null.
> + * @base: The number base to use. The maximum supported base is 16. If base is
> + * given as 0, then the base of the string is automatically detected with the
> + * conventional semantics - If it begins with 0x the number will be parsed as a
> + * hexadecimal (case insensitive), if it otherwise begins with 0, it will be
> + * parsed as an octal number. Otherwise it will be parsed as a decimal.
> + * @res: Where to write the result of the conversion on success.
> + * @suffixes: A string of acceptable suffixes, must be provided. Or caller
> + * should use kstrtoull() directly.

The suffixes parameter seems a bit cumbersome; callers need to provide
both upper and lower cases, and unsupported characters aren't checked
for. However, I can't think of any better suggestions at this stage.

> + *
> + *
> + * Return 0 on success.
> + *
> + * Return -ERANGE on overflow or -EINVAL if invalid chars found.
> + * Return value must be checked.
> + */
> +int kstrtoull_suffix(const char *s, unsigned int base, unsigned long long *res,
> + const char *suffixes)
> +{
> + unsigned long long init_value;
> + unsigned long long final_value;
> + char *endptr;
> + int ret;
> +
> + ret = _kstrtoull(s, base, &init_value, &endptr);
> + /* Either already overflow or no number string at all. */
> + if (ret < 0)
> + return ret;
> + final_value = init_value;
> + /* No suffixes. */
> + if (!*endptr)
> + goto done;
> +
> + switch (*endptr) {
> + case 'K':
> + case 'k':
> + if (!strchr(suffixes, *endptr))
> + return -EINVAL;
> + final_value <<= 10;
> + endptr++;
> + break;
> + case 'M':
> + case 'm':
> + if (!strchr(suffixes, *endptr))
> + return -EINVAL;
> + final_value <<= 20;
> + endptr++;
> + break;
> + case 'G':
> + case 'g':
> + if (!strchr(suffixes, *endptr))
> + return -EINVAL;
> + final_value <<= 30;
> + endptr++;
> + break;
> + case 'T':
> + case 't':
> + if (!strchr(suffixes, *endptr))
> + return -EINVAL;
> + final_value <<= 40;
> + endptr++;
> + break;
> + case 'P':
> + case 'p':
> + if (!strchr(suffixes, *endptr))
> + return -EINVAL;
> + final_value <<= 50;
> + endptr++;
> + break;
> + case 'E':
> + case 'e':
> + if (!strchr(suffixes, *endptr))
> + return -EINVAL;
> + final_value <<= 60;
> + endptr++;
> + break;
> + }
> + if (*endptr == '\n')

Nit: the per-case logic could be simplified to a single "shift_val = X"
if you initialise and handle !shift_val.

> + endptr++;
> + if (*endptr)
> + return -EINVAL;
> +
> + /* Overflow check. */
> + if (final_value < init_value)
> + return -EOVERFLOW;
> +done:
> + *res = final_value;
> + return 0;
> +}
> +EXPORT_SYMBOL(kstrtoull_suffix);
> +
> /**
> * kstrtoll - convert a string to a long long
> * @s: The start of the string. The string must be null-terminated, and may also
> @@ -159,7 +262,7 @@ int kstrtoll(const char *s, unsigned int base, long long *res)
> int rv;
>
> if (s[0] == '-') {
> - rv = _kstrtoull(s + 1, base, &tmp);
> + rv = _kstrtoull(s + 1, base, &tmp, NULL);
> if (rv < 0)
> return rv;
> if ((long long)-tmp > 0)

2023-12-18 13:44:33

by Andy Shevchenko

[permalink] [raw]

Subject: Re: [PATCH 1/2] lib/strtox: introduce kstrtoull_suffix() helper

On Fri, Dec 15, 2023 at 07:09:23PM +1030, Qu Wenruo wrote:
> Just as mentioned in the comment of memparse(), the simple_stroull()
> usage can lead to overflow all by itself.
>
> Furthermore, the suffix calculation is also super overflow prone because
> that some suffix like "E" itself would eat 60bits, leaving only 4 bits
> available.
>
> And that suffix "E" can also lead to confusion since it's using the same
> char of hex Ox'E'.

How would you distinguish 25E with [0x]25e?
I believe it's unsolvable issue as long as we have it already.

> One simple example to expose all the problem is to use memparse() on
> "25E".
> The correct value should be 28823037615171174400, but the suffix E makes
> it super simple to overflow, resulting the incorrect value
> 10376293541461622784 (9E).

So, then you can probably improve memparse()?

> So here we introduce a new helper to address the problem,
> kstrtoull_suffix():

This is a horrible naming. What suffix? What would be without it
(if it's even possible)? I have more questions than answers...

> - Enhance _kstrtoull()
> This allow _kstrtoull() to return even if it hits an invalid char, as
> long as the optional parameter @retptr is provided.
>
> If @retptr is provided, _kstrtoull() would try its best to parse the
> valid part, and leave the remaining to be handled by the caller.
>
> If @retptr is not provided, the behavior is not altered.

Can we not touch that one. I admit that it may be not used in the hot paths,
but I prefer that it does exactly what it does in a strict way.

> - New kstrtoull_suffix() helper
> This new helper utilize the new @retptr capability of _kstrtoull(),
> and provides 2 new ability:
>
> * Allow certain suffixes to be chosen
> The recommended suffix list is "KkMmGgTtPp", excluding the overflow
> prone "Ee". Undermost cases there is really no need to use "E" suffix
> anyway.
> And for those who really need that exabytes suffix, they can enable
> that suffix pretty easily.
>
> * Add overflow checks for the suffixes
> If the original number string is fine, but with the extra left
> shift overflow happens, then -EOVERFLOW is returned.

And formal NAK due to lack of test cases. We do not accept new generic
code without test cases.

--
With Best Regards,
Andy Shevchenko

2023-12-18 19:53:21

by Qu Wenruo

[permalink] [raw]

Subject: Re: [PATCH 1/2] lib/strtox: introduce kstrtoull_suffix() helper

On 2023/12/18 23:29, David Disseldorp wrote:
> Hi Qu,
>
> On Fri, 15 Dec 2023 19:09:23 +1030, Qu Wenruo wrote:
>
[...]
>> +/**
>> + * kstrtoull_suffix - convert a string to ull with suffixes support
>> + * @s: The start of the string. The string must be null-terminated, and may also
>> + * include a single newline before its terminating null.
>> + * @base: The number base to use. The maximum supported base is 16. If base is
>> + * given as 0, then the base of the string is automatically detected with the
>> + * conventional semantics - If it begins with 0x the number will be parsed as a
>> + * hexadecimal (case insensitive), if it otherwise begins with 0, it will be
>> + * parsed as an octal number. Otherwise it will be parsed as a decimal.
>> + * @res: Where to write the result of the conversion on success.
>> + * @suffixes: A string of acceptable suffixes, must be provided. Or caller
>> + * should use kstrtoull() directly.
>
> The suffixes parameter seems a bit cumbersome; callers need to provide
> both upper and lower cases, and unsupported characters aren't checked
> for. However, I can't think of any better suggestions at this stage.
>

Initially I went bitmap for the prefixes, but it's not any better.

Firstly where the bitmap should start. If we go bit 0 for "K", then the
code would introduce some difference between the bit number and left
shift (bit 0, left shift 10), which may be a little confusing.

If we go bit 1 for "K", the bit and left shift it much better, but bit 0
behavior would be left untouched.

Finally the bitmap itself is not that straightforward.

The limitation of providing both upper and lower case is due to the fact
that we don't have a case insensitive version of strchr().
But I think it's not that to fix, just convert them all to lower or
upper case, then do the strchr().

Would accepting both cases for the suffixes be good enough?

>> + *
>> + *
>> + * Return 0 on success.
>> + *
>> + * Return -ERANGE on overflow or -EINVAL if invalid chars found.
>> + * Return value must be checked.
>> + */
>> +int kstrtoull_suffix(const char *s, unsigned int base, unsigned long long *res,
>> + const char *suffixes)
>> +{
>> + unsigned long long init_value;
>> + unsigned long long final_value;
>> + char *endptr;
>> + int ret;
>> +
>> + ret = _kstrtoull(s, base, &init_value, &endptr);
>> + /* Either already overflow or no number string at all. */
>> + if (ret < 0)
>> + return ret;
>> + final_value = init_value;
>> + /* No suffixes. */
>> + if (!*endptr)
>> + goto done;
>> +
>> + switch (*endptr) {
>> + case 'K':
>> + case 'k':
>> + if (!strchr(suffixes, *endptr))
>> + return -EINVAL;
>> + final_value <<= 10;
>> + endptr++;
>> + break;
>> + case 'M':
>> + case 'm':
>> + if (!strchr(suffixes, *endptr))
>> + return -EINVAL;
>> + final_value <<= 20;
>> + endptr++;
>> + break;
>> + case 'G':
>> + case 'g':
>> + if (!strchr(suffixes, *endptr))
>> + return -EINVAL;
>> + final_value <<= 30;
>> + endptr++;
>> + break;
>> + case 'T':
>> + case 't':
>> + if (!strchr(suffixes, *endptr))
>> + return -EINVAL;
>> + final_value <<= 40;
>> + endptr++;
>> + break;
>> + case 'P':
>> + case 'p':
>> + if (!strchr(suffixes, *endptr))
>> + return -EINVAL;
>> + final_value <<= 50;
>> + endptr++;
>> + break;
>> + case 'E':
>> + case 'e':
>> + if (!strchr(suffixes, *endptr))
>> + return -EINVAL;
>> + final_value <<= 60;
>> + endptr++;
>> + break;
>> + }
>> + if (*endptr == '\n')
>
> Nit: the per-case logic could be simplified to a single "shift_val = X"
> if you initialise and handle !shift_val.

Indeed, thanks for the hint!

Thanks,
Qu
>
>> + endptr++;
>> + if (*endptr)
>> + return -EINVAL;
>> +
>> + /* Overflow check. */
>> + if (final_value < init_value)
>> + return -EOVERFLOW;
>> +done:
>> + *res = final_value;
>> + return 0;
>> +}
>> +EXPORT_SYMBOL(kstrtoull_suffix);
>> +
>> /**
>> * kstrtoll - convert a string to a long long
>> * @s: The start of the string. The string must be null-terminated, and may also
>> @@ -159,7 +262,7 @@ int kstrtoll(const char *s, unsigned int base, long long *res)
>> int rv;
>>
>> if (s[0] == '-') {
>> - rv = _kstrtoull(s + 1, base, &tmp);
>> + rv = _kstrtoull(s + 1, base, &tmp, NULL);
>> if (rv < 0)
>> return rv;
>> if ((long long)-tmp > 0)
>
>
>

2023-12-19 03:17:51

by David Disseldorp

[permalink] [raw]

Subject: Re: [PATCH 1/2] lib/strtox: introduce kstrtoull_suffix() helper

On Tue, 19 Dec 2023 06:22:42 +1030, Qu Wenruo wrote:

> On 2023/12/18 23:29, David Disseldorp wrote:
> > Hi Qu,
> >
> > On Fri, 15 Dec 2023 19:09:23 +1030, Qu Wenruo wrote:
> >
> [...]
> >> +/**
> >> + * kstrtoull_suffix - convert a string to ull with suffixes support
> >> + * @s: The start of the string. The string must be null-terminated, and may also
> >> + * include a single newline before its terminating null.
> >> + * @base: The number base to use. The maximum supported base is 16. If base is
> >> + * given as 0, then the base of the string is automatically detected with the
> >> + * conventional semantics - If it begins with 0x the number will be parsed as a
> >> + * hexadecimal (case insensitive), if it otherwise begins with 0, it will be
> >> + * parsed as an octal number. Otherwise it will be parsed as a decimal.
> >> + * @res: Where to write the result of the conversion on success.
> >> + * @suffixes: A string of acceptable suffixes, must be provided. Or caller
> >> + * should use kstrtoull() directly.
> >
> > The suffixes parameter seems a bit cumbersome; callers need to provide
> > both upper and lower cases, and unsupported characters aren't checked
> > for. However, I can't think of any better suggestions at this stage.
> >
>
> Initially I went bitmap for the prefixes, but it's not any better.
>
> Firstly where the bitmap should start. If we go bit 0 for "K", then the
> code would introduce some difference between the bit number and left
> shift (bit 0, left shift 10), which may be a little confusing.
>
> If we go bit 1 for "K", the bit and left shift it much better, but bit 0
> behavior would be left untouched.
>
> Finally the bitmap itself is not that straightforward.

One benefit from a bitmap would be that unsupported @suffixes are easier
to detect (instead of ignored), but I think if you rename the function
kstrtoull_unit_suffix() then it should be pretty clear what's supported.

> The limitation of providing both upper and lower case is due to the fact
> that we don't have a case insensitive version of strchr().
> But I think it's not that to fix, just convert them all to lower or
> upper case, then do the strchr().
>
> Would accepting both cases for the suffixes be good enough?

I think so.

Cheers, David

2023-12-19 16:45:49

by David Laight

[permalink] [raw]

Subject: RE: [PATCH 1/2] lib/strtox: introduce kstrtoull_suffix() helper

From: David Disseldorp
> Sent: 18 December 2023 13:00
>
> On Fri, 15 Dec 2023 19:09:23 +1030, Qu Wenruo wrote:
>
> > Just as mentioned in the comment of memparse(), the simple_stroull()
> > usage can lead to overflow all by itself.
> >
> > Furthermore, the suffix calculation is also super overflow prone because
> > that some suffix like "E" itself would eat 60bits, leaving only 4 bits
> > available.
> >
> > And that suffix "E" can also lead to confusion since it's using the same
> > char of hex Ox'E'.
> >
> > One simple example to expose all the problem is to use memparse() on
> > "25E".
> > The correct value should be 28823037615171174400, but the suffix E makes
> > it super simple to overflow, resulting the incorrect value
> > 10376293541461622784 (9E).

Some more bikeshed paint :-)
...
> > + ret = _kstrtoull(s, base, &init_value, &endptr);
> > + /* Either already overflow or no number string at all. */
> > + if (ret < 0)
> > + return ret;
> > + final_value = init_value;
> > + /* No suffixes. */
> > + if (!*endptr)
> > + goto done;

How about:
suffix = *endptr;
if (!strchr(suffixes, suffix))
return -ENIVAL;
shift = strcspn("KkMmGgTtPp", suffix)/2 * 10 + 10;
if (shift > 50)
return -EINVAL;
if (value >> (64 - shift))
return -EOVERFLOW;
value <<= shift;

Although purists might want to multiply by 1000 not 1024.
And SI multipliers are all upper-case - except k.

...
> > + /* Overflow check. */
> > + if (final_value < init_value)
> > + return -EOVERFLOW;

That is just plain wrong.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2023-12-19 21:18:28

by Qu Wenruo

[permalink] [raw]

Subject: Re: [PATCH 1/2] lib/strtox: introduce kstrtoull_suffix() helper

On 2023/12/20 03:12, David Laight wrote:
> From: David Disseldorp
>> Sent: 18 December 2023 13:00
>>
>> On Fri, 15 Dec 2023 19:09:23 +1030, Qu Wenruo wrote:
>>
>>> Just as mentioned in the comment of memparse(), the simple_stroull()
>>> usage can lead to overflow all by itself.
>>>
>>> Furthermore, the suffix calculation is also super overflow prone because
>>> that some suffix like "E" itself would eat 60bits, leaving only 4 bits
>>> available.
>>>
>>> And that suffix "E" can also lead to confusion since it's using the same
>>> char of hex Ox'E'.
>>>
>>> One simple example to expose all the problem is to use memparse() on
>>> "25E".
>>> The correct value should be 28823037615171174400, but the suffix E makes
>>> it super simple to overflow, resulting the incorrect value
>>> 10376293541461622784 (9E).
>
> Some more bikeshed paint :-)
> ...
>>> + ret = _kstrtoull(s, base, &init_value, &endptr);
>>> + /* Either already overflow or no number string at all. */
>>> + if (ret < 0)
>>> + return ret;
>>> + final_value = init_value;
>>> + /* No suffixes. */
>>> + if (!*endptr)
>>> + goto done;
>
> How about:
> suffix = *endptr;
> if (!strchr(suffixes, suffix))
> return -ENIVAL;
> shift = strcspn("KkMmGgTtPp", suffix)/2 * 10 + 10;

This means the caller has to provide the suffix string in this
particular order.
For default suffix list it's not that hard as it's already defined as a
macro.

But for those call sites which needs "E", wrongly located "Ee" can screw
up the whole process.

> if (shift > 50)
> return -EINVAL;
> if (value >> (64 - shift))
> return -EOVERFLOW;
> value <<= shift;
>
> Although purists might want to multiply by 1000 not 1024.
> And SI multipliers are all upper-case - except k.
>
> ...
>>> + /* Overflow check. */
>>> + if (final_value < init_value)
>>> + return -EOVERFLOW;
>
> That is just plain wrong.

Indeed, I just found a very simple example to prove it wrong, 4 bit
binary 0110, left shift 2, result is 1000, still larger than the
original one.

Thanks,
Qu
>
> David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
>
>

2023-12-20 08:32:29

by David Laight

[permalink] [raw]

Subject: RE: [PATCH 1/2] lib/strtox: introduce kstrtoull_suffix() helper

From: Qu Wenruo
> Sent: 19 December 2023 21:18
...
> > How about:
> > suffix = *endptr;
> > if (!strchr(suffixes, suffix))
> > return -ENIVAL;
> > shift = strcspn("KkMmGgTtPp", suffix)/2 * 10 + 10;
>
> This means the caller has to provide the suffix string in this
> particular order.

No, The strchr() checks that the suffix is one the caller wanted.
The strcspn() is against a fixed list - so the order can be
selected to make the code shorter.

Actually strcspn() isn't the function you need.
There might be a function like strchr() that returns a count
but I can't remember its name and it may not be in kernel.
You might have to write:
shift = 0;
for (const char *sfp = "KkMmGgTtPp"; suffix != *sfp; sfp++, shift++) {
if (!*sfp)
return -EINVAL;
}
shift = shift/2 + 1 * 10;

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2023-12-20 09:32:31

by Qu Wenruo

[permalink] [raw]

Subject: Re: [PATCH 1/2] lib/strtox: introduce kstrtoull_suffix() helper

On 2023/12/20 19:01, David Laight wrote:
> From: Qu Wenruo
>> Sent: 19 December 2023 21:18
> ...
>>> How about:
>>> suffix = *endptr;
>>> if (!strchr(suffixes, suffix))
>>> return -ENIVAL;
>>> shift = strcspn("KkMmGgTtPp", suffix)/2 * 10 + 10;
>>
>> This means the caller has to provide the suffix string in this
>> particular order.
>
> No, The strchr() checks that the suffix is one the caller wanted.
> The strcspn() is against a fixed list - so the order can be
> selected to make the code shorter.

Ah, got it.
Although in that case, the 1st parameter should be ("KkMmGgTtPpEe"), as
we still support "Ee", just not by default.

Thanks,
Qu
>
> Actually strcspn() isn't the function you need.
> There might be a function like strchr() that returns a count
> but I can't remember its name and it may not be in kernel.
> You might have to write:
> shift = 0;
> for (const char *sfp = "KkMmGgTtPp"; suffix != *sfp; sfp++, shift++) {
> if (!*sfp)
> return -EINVAL;
> }
> shift = shift/2 + 1 * 10;
>
> David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)

2023-12-20 09:55:12

by Alexey Dobriyan

[permalink] [raw]

Subject: Re: [PATCH 1/2] lib/strtox: introduce kstrtoull_suffix() helper

> Just as mentioned in the comment of memparse(), the simple_stroull()
> usage can lead to overflow all by itself.

which is the root cause...

I don't like one char suffixes. They are easy to integrate but then the
_real_ suffixes are "MiB", "GiB", etc.

If you care only about memparse(), then using _parse_integer() can be
arranged. I don't see why not.

2023-12-20 10:02:22

by Qu Wenruo

[permalink] [raw]

Subject: Re: [PATCH 1/2] lib/strtox: introduce kstrtoull_suffix() helper

On 2023/12/20 20:24, Alexey Dobriyan wrote:
>> Just as mentioned in the comment of memparse(), the simple_stroull()
>> usage can lead to overflow all by itself.
>
> which is the root cause...
>
> I don't like one char suffixes. They are easy to integrate but then the
> _real_ suffixes are "MiB", "GiB", etc.
>
> If you care only about memparse(), then using _parse_integer() can be
> arranged. I don't see why not.

Well, personally speaking I don't think we should even support the
suffix at all, at least for the only two usage inside btrfs.

But unfortunately I'm not the one to do the final call, and the final
call is to keep the suffix behavior...

And indeed using _parse_integer() with _parse_interger_fixup_radix()
would be better, as we don't need to extend the _kstrtoull() code base.

Thanks,
Qu

Attachments:

OpenPGP_0xC23D91F3A125FEA8.asc (6.86 kB)
OpenPGP public key OpenPGP_signature.asc (505.00 B)
OpenPGP digital signature Download all attachments

2023-12-20 14:24:25

by Andy Shevchenko

[permalink] [raw]

Subject: Re: [PATCH 1/2] lib/strtox: introduce kstrtoull_suffix() helper

On Wed, Dec 20, 2023 at 08:31:09PM +1030, Qu Wenruo wrote:
> On 2023/12/20 20:24, Alexey Dobriyan wrote:
> > > Just as mentioned in the comment of memparse(), the simple_stroull()
> > > usage can lead to overflow all by itself.
> >
> > which is the root cause...
> >
> > I don't like one char suffixes. They are easy to integrate but then the
> > _real_ suffixes are "MiB", "GiB", etc.
> >
> > If you care only about memparse(), then using _parse_integer() can be
> > arranged. I don't see why not.
>
> Well, personally speaking I don't think we should even support the suffix at
> all, at least for the only two usage inside btrfs.
>
> But unfortunately I'm not the one to do the final call, and the final call
> is to keep the suffix behavior...
>
> And indeed using _parse_integer() with _parse_interger_fixup_radix() would
> be better, as we don't need to extend the _kstrtoull() code base.

My comment on the first patch got vanished due to my MTA issues, but I'll try
to summarize my point here.

First of all, I do not like the naming, it's too vague. What kind of suffix?
Do we suppose to have suffix in the input? What will be the behaviour w/o
suffix? And so on...

Second, if it's a problem in memparse(), just fix it and that's all.

Third, as Alexey said, we have metric and byte suffixes and they are different.
Supporting one without the other is just adding to the existing confusion.

Last, but not least, we do NOT accept new code in the lib/ without test cases.

So, that said here is my formal NAK for this series (at least in this form).

P.S> The Subject should start with either kstrtox: or lib/kstrtox.c.

--
With Best Regards,
Andy Shevchenko

2023-12-20 15:02:12

by Andy Shevchenko

[permalink] [raw]

Subject: Re: [PATCH 1/2] lib/strtox: introduce kstrtoull_suffix() helper

2023-12-20 20:38:53

by Qu Wenruo

[permalink] [raw]

Subject: Re: [PATCH 1/2] lib/strtox: introduce kstrtoull_suffix() helper

On 2023/12/21 00:54, Andy Shevchenko wrote:
>
> On Wed, Dec 20, 2023 at 08:31:09PM +1030, Qu Wenruo wrote:
>> On 2023/12/20 20:24, Alexey Dobriyan wrote:
>>>> Just as mentioned in the comment of memparse(), the simple_stroull()
>>>> usage can lead to overflow all by itself.
>>>
>>> which is the root cause...
>>>
>>> I don't like one char suffixes. They are easy to integrate but then the
>>> _real_ suffixes are "MiB", "GiB", etc.
>>>
>>> If you care only about memparse(), then using _parse_integer() can be
>>> arranged. I don't see why not.
>>
>> Well, personally speaking I don't think we should even support the suffix at
>> all, at least for the only two usage inside btrfs.
>>
>> But unfortunately I'm not the one to do the final call, and the final call
>> is to keep the suffix behavior...
>>
>> And indeed using _parse_integer() with _parse_interger_fixup_radix() would
>> be better, as we don't need to extend the _kstrtoull() code base.
>
> My comment on the first patch got vanished due to my MTA issues, but I'll try
> to summarize my point here.
>
> First of all, I do not like the naming, it's too vague. What kind of suffix?
> Do we suppose to have suffix in the input? What will be the behaviour w/o
> suffix? And so on...

I really like David Sterb to hear this though.

To me, we should mark memparse() as deprecated as soon as possible, not
spreading the damn pandemic to any newer code.

The "convenience" is not an excuse to use incorrect code.

>
> Second, if it's a problem in memparse(), just fix it and that's all.

Nope, the memparse() itself doesn't have any way to indicate errors.

It's not fixable in the first place, as long as you want a drop-in solution.

>
> Third, as Alexey said, we have metric and byte suffixes and they are different.
> Supporting one without the other is just adding to the existing confusion.
>
> Last, but not least, we do NOT accept new code in the lib/ without test cases.
>
> So, that said here is my formal NAK for this series (at least in this form).

Then why there is the hell of memparse() in the first place?
It doesn't have test case (we have cmdline_kunit, but it doesn't test
memparse() at all), nor the proper error detection.

I'm fine to get my patch rejected, but why the hell of memparse() is
here in the first place?
It doesn't fit any of the standard you mentioned.

Thanks,
Qu

>
> P.S> The Subject should start with either kstrtox: or lib/kstrtox.c.
>

2023-12-21 12:01:12

by Andy Shevchenko

[permalink] [raw]

Subject: Re: [PATCH 1/2] lib/strtox: introduce kstrtoull_suffix() helper

On Thu, Dec 21, 2023 at 07:08:08AM +1030, Qu Wenruo wrote:
> On 2023/12/21 00:54, Andy Shevchenko wrote:
> > On Wed, Dec 20, 2023 at 08:31:09PM +1030, Qu Wenruo wrote:
> > > On 2023/12/20 20:24, Alexey Dobriyan wrote:
> > > > > Just as mentioned in the comment of memparse(), the simple_stroull()
> > > > > usage can lead to overflow all by itself.
> > > >
> > > > which is the root cause...
> > > >
> > > > I don't like one char suffixes. They are easy to integrate but then the
> > > > _real_ suffixes are "MiB", "GiB", etc.
> > > >
> > > > If you care only about memparse(), then using _parse_integer() can be
> > > > arranged. I don't see why not.
> > >
> > > Well, personally speaking I don't think we should even support the suffix at
> > > all, at least for the only two usage inside btrfs.
> > >
> > > But unfortunately I'm not the one to do the final call, and the final call
> > > is to keep the suffix behavior...
> > >
> > > And indeed using _parse_integer() with _parse_interger_fixup_radix() would
> > > be better, as we don't need to extend the _kstrtoull() code base.
> >
> > My comment on the first patch got vanished due to my MTA issues, but I'll try
> > to summarize my point here.
> >
> > First of all, I do not like the naming, it's too vague. What kind of suffix?
> > Do we suppose to have suffix in the input? What will be the behaviour w/o
> > suffix? And so on...
>
> I really like David Sterb to hear this though.

Me too, I like to hear opinions. But I will fight for the best we can do here.

> To me, we should mark memparse() as deprecated as soon as possible, not
> spreading the damn pandemic to any newer code.

Send a patch!

> The "convenience" is not an excuse to use incorrect code.

I do not object this.

> > Second, if it's a problem in memparse(), just fix it and that's all.
>
> Nope, the memparse() itself doesn't have any way to indicate errors.
>
> It's not fixable in the first place, as long as you want a drop-in solution.
>
> > Third, as Alexey said, we have metric and byte suffixes and they are different.
> > Supporting one without the other is just adding to the existing confusion.
> >
> > Last, but not least, we do NOT accept new code in the lib/ without test cases.
> >
> > So, that said here is my formal NAK for this series (at least in this form).
>
> Then why there is the hell of memparse() in the first place?

You have all means to investigate.
It used to be setup_mem() till 9b0f5889b12b ("Linux 2.2.18pre9"),
which in turn was split from setup_arch() in 716454f016a9 ("Import
2.1.121pre1")... Looking deeper seems it comes as a parser at hand
for the mem= command line parameter very long time ago.

> It doesn't have test case (we have cmdline_kunit, but it doesn't test
> memparse() at all), nor the proper error detection.

Exactly! Someone's job to add this. And the best is the one who touches
the code. See how cmdline_kunit appears.

> I'm fine to get my patch rejected, but why the hell of memparse() is
> here in the first place?
> It doesn't fit any of the standard you mentioned.

So, what standard did we have in above mentioned (prehistorical) time?

> > P.S> The Subject should start with either kstrtox: or lib/kstrtox.c.

--
With Best Regards,
Andy Shevchenko

2023-12-21 20:38:35

by Qu Wenruo

[permalink] [raw]

Subject: Re: [PATCH 1/2] lib/strtox: introduce kstrtoull_suffix() helper

On 2023/12/21 22:30, Andy Shevchenko wrote:
> On Thu, Dec 21, 2023 at 07:08:08AM +1030, Qu Wenruo wrote:
>> On 2023/12/21 00:54, Andy Shevchenko wrote:
>>> On Wed, Dec 20, 2023 at 08:31:09PM +1030, Qu Wenruo wrote:
>>>> On 2023/12/20 20:24, Alexey Dobriyan wrote:
>>>>>> Just as mentioned in the comment of memparse(), the simple_stroull()
>>>>>> usage can lead to overflow all by itself.
>>>>>
>>>>> which is the root cause...
>>>>>
>>>>> I don't like one char suffixes. They are easy to integrate but then the
>>>>> _real_ suffixes are "MiB", "GiB", etc.
>>>>>
>>>>> If you care only about memparse(), then using _parse_integer() can be
>>>>> arranged. I don't see why not.
>>>>
>>>> Well, personally speaking I don't think we should even support the suffix at
>>>> all, at least for the only two usage inside btrfs.
>>>>
>>>> But unfortunately I'm not the one to do the final call, and the final call
>>>> is to keep the suffix behavior...
>>>>
>>>> And indeed using _parse_integer() with _parse_interger_fixup_radix() would
>>>> be better, as we don't need to extend the _kstrtoull() code base.
>>>
>>> My comment on the first patch got vanished due to my MTA issues, but I'll try
>>> to summarize my point here.
>>>
>>> First of all, I do not like the naming, it's too vague. What kind of suffix?
>>> Do we suppose to have suffix in the input? What will be the behaviour w/o
>>> suffix? And so on...
>>
>> I really like David Sterb to hear this though.
>
> Me too, I like to hear opinions. But I will fight for the best we can do here.
>
>> To me, we should mark memparse() as deprecated as soon as possible, not
>> spreading the damn pandemic to any newer code.
>
> Send a patch!
>
>> The "convenience" is not an excuse to use incorrect code.
>
> I do not object this.
>
>>> Second, if it's a problem in memparse(), just fix it and that's all.
>>
>> Nope, the memparse() itself doesn't have any way to indicate errors.
>>
>> It's not fixable in the first place, as long as you want a drop-in solution.
>>
>>> Third, as Alexey said, we have metric and byte suffixes and they are different.
>>> Supporting one without the other is just adding to the existing confusion.
>>>
>>> Last, but not least, we do NOT accept new code in the lib/ without test cases.
>>>
>>> So, that said here is my formal NAK for this series (at least in this form).
>>
>> Then why there is the hell of memparse() in the first place?
>
> You have all means to investigate.
> It used to be setup_mem() till 9b0f5889b12b ("Linux 2.2.18pre9"),
> which in turn was split from setup_arch() in 716454f016a9 ("Import
> 2.1.121pre1")... Looking deeper seems it comes as a parser at hand
> for the mem= command line parameter very long time ago.
>
>> It doesn't have test case (we have cmdline_kunit, but it doesn't test
>> memparse() at all), nor the proper error detection.
>
> Exactly! Someone's job to add this. And the best is the one who touches
> the code. See how cmdline_kunit appears.
>
>> I'm fine to get my patch rejected, but why the hell of memparse() is
>> here in the first place?
>> It doesn't fit any of the standard you mentioned.
>
> So, what standard did we have in above mentioned (prehistorical) time?

Fine, there is no standard in the ancient days.

Then what about going the following path for the whole memparse() rabbit
hole?

- Mark the old memparse() deprecated
- Add a new function memparse_safe() (or rename the older one to
__memparse, and let the new one to be named memparse()?)
- Add unit test for the new memparse_safe() or whatever the name is
- Try my best to migrate as many call sites as possible
Only the two btrfs ones I'm 100% confident for now

Would that be a sounding plan?

Thanks,
Qu
>
>>> P.S> The Subject should start with either kstrtox: or lib/kstrtox.c.
>

2023-12-21 20:55:40

by Andy Shevchenko

[permalink] [raw]

Subject: Re: [PATCH 1/2] lib/strtox: introduce kstrtoull_suffix() helper

On Fri, Dec 22, 2023 at 07:07:31AM +1030, Qu Wenruo wrote:
> On 2023/12/21 22:30, Andy Shevchenko wrote:

...

> Then what about going the following path for the whole memparse() rabbit
> hole?
>
> - Mark the old memparse() deprecated
> - Add a new function memparse_safe() (or rename the older one to
> __memparse, and let the new one to be named memparse()?)
> - Add unit test for the new memparse_safe() or whatever the name is
> - Try my best to migrate as many call sites as possible
> Only the two btrfs ones I'm 100% confident for now
>
> Would that be a sounding plan?

As long as it does not break any ABI (like kernel command line parsing),
sounds good to me.

--
With Best Regards,
Andy Shevchenko