2023-12-20 00:10:46

by Qu Wenruo

[permalink] [raw]
Subject: [PATCH v2 1/2] lib/strtox: introduce kstrtoull_suffix() helper

Just as mentioned in the comment of memparse(), the simple_stroull()
usage can lead to overflow all by itself.

Furthermore, the suffix calculation is also super overflow prone because
that some suffix like "E" itself would eat 60bits, leaving only 4 bits
available.

And that suffix "E" can also lead to confusion since it's using the same
char of hex Ox'E'.

One simple example to expose all the problem is to use memparse() on
"25E".
The correct value should be 28823037615171174400, but the suffix E makes
it super simple to overflow, resulting the incorrect value
10376293541461622784 (9E).

So here we introduce a new helper to address the problem,
kstrtoull_suffix():

- Enhance _kstrtoull()
This allow _kstrtoull() to return even if it hits an invalid char, as
long as the optional parameter @retptr is provided.

If @retptr is provided, _kstrtoull() would try its best to parse the
valid part, and leave the remaining to be handled by the caller.

If @retptr is not provided, the behavior is not altered.

- New kstrtoull_suffix() helper
This new helper utilize the new @retptr capability of _kstrtoull(),
and provides 2 new ability:

* Allow certain suffixes to be chosen
The recommended suffix list is "KMGTP" (using the new unit_suffix
enum as a bitmap), excluding the overflow prone "E".
Undermost cases there is really no need to use "E" suffix anyway.
And for those who really need that exabytes suffix, they can enable
that suffix pretty easily.

* Add overflow checks for the suffixes
If the original number string is fine, but with the extra left
shift overflow happens, then -EOVERFLOW is returned.

Cc: Andrew Morton <[email protected]>
Cc: Christophe JAILLET <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: David Laight <[email protected]>
Cc: [email protected]
Signed-off-by: Qu Wenruo <[email protected]>

----
Changelog:
v2:
- Use enum bitmap to describe the suffixes
This gets rid of the upper/lower case problem, and enum makes it
a little more readable.

- Fix the suffix overflow detection

- Move the left shift out of the switch block

- Remove the "done" tag
Since no tailing character can already be handled properly.
---
include/linux/kstrtox.h | 20 ++++++++
lib/kstrtox.c | 108 ++++++++++++++++++++++++++++++++++++++--
2 files changed, 123 insertions(+), 5 deletions(-)

diff --git a/include/linux/kstrtox.h b/include/linux/kstrtox.h
index 7fcf29a4e0de..edac52d18a8e 100644
--- a/include/linux/kstrtox.h
+++ b/include/linux/kstrtox.h
@@ -9,6 +9,26 @@
int __must_check _kstrtoul(const char *s, unsigned int base, unsigned long *res);
int __must_check _kstrtol(const char *s, unsigned int base, long *res);

+/*
+ * The default suffix list would not include "E" since it's too easy to overflow
+ * and not much real world usage.
+ */
+enum unit_suffix {
+ SUFFIX_K = (1 << 0),
+ SUFFIX_M = (1 << 1),
+ SUFFIX_G = (1 << 2),
+ SUFFIX_T = (1 << 3),
+ SUFFIX_P = (1 << 4),
+ SUFFIX_E = (1 << 5),
+};
+
+/*
+ * The default suffix list would not include "E" since it's too easy to overflow
+ * and not much real world usage.
+ */
+#define KSTRTOULL_SUFFIX_DEFAULT (SUFFIX_K | SUFFIX_M | SUFFIX_G | SUFFIX_T | SUFFIX_P)
+int kstrtoull_suffix(const char *s, unsigned int base, unsigned long long *res,
+ enum unit_suffix suffixes);
int __must_check kstrtoull(const char *s, unsigned int base, unsigned long long *res);
int __must_check kstrtoll(const char *s, unsigned int base, long long *res);

diff --git a/lib/kstrtox.c b/lib/kstrtox.c
index d586e6af5e5a..8a2fdd1e3376 100644
--- a/lib/kstrtox.c
+++ b/lib/kstrtox.c
@@ -93,7 +93,8 @@ unsigned int _parse_integer(const char *s, unsigned int base, unsigned long long
return _parse_integer_limit(s, base, p, INT_MAX);
}

-static int _kstrtoull(const char *s, unsigned int base, unsigned long long *res)
+static int _kstrtoull(const char *s, unsigned int base, unsigned long long *res,
+ char **retptr)
{
unsigned long long _res;
unsigned int rv;
@@ -105,11 +106,19 @@ static int _kstrtoull(const char *s, unsigned int base, unsigned long long *res)
if (rv == 0)
return -EINVAL;
s += rv;
- if (*s == '\n')
+
+ /*
+ * If @retptr is provided, caller is responsible to detect
+ * the extra chars, otherwise we can skip one newline.
+ */
+ if (!retptr && *s == '\n')
s++;
- if (*s)
+ if (!retptr && *s)
return -EINVAL;
+
*res = _res;
+ if (retptr)
+ *retptr = (char *)s;
return 0;
}

@@ -133,10 +142,99 @@ int kstrtoull(const char *s, unsigned int base, unsigned long long *res)
{
if (s[0] == '+')
s++;
- return _kstrtoull(s, base, res);
+ return _kstrtoull(s, base, res, NULL);
}
EXPORT_SYMBOL(kstrtoull);

+/**
+ * kstrtoull_suffix - convert a string to ull with suffixes support
+ * @s: The start of the string. The string must be null-terminated, and may also
+ * include a single newline before its terminating null.
+ * @base: The number base to use. The maximum supported base is 16. If base is
+ * given as 0, then the base of the string is automatically detected with the
+ * conventional semantics - If it begins with 0x the number will be parsed as a
+ * hexadecimal (case insensitive), if it otherwise begins with 0, it will be
+ * parsed as an octal number. Otherwise it will be parsed as a decimal.
+ * @res: Where to write the result of the conversion on success.
+ * @suffixes: bitmap of acceptable suffixes, unknown bits would be ignored.
+ *
+ * Return 0 on success.
+ *
+ * Return -ERANGE on overflow or -EINVAL if invalid chars found.
+ * Return value must be checked.
+ */
+int kstrtoull_suffix(const char *s, unsigned int base, unsigned long long *res,
+ enum unit_suffix suffixes)
+{
+ unsigned long long value;
+ int shift = 0;
+ char *endptr;
+ int ret;
+
+ ret = _kstrtoull(s, base, &value, &endptr);
+ /* Either already overflow or no number string at all. */
+ if (ret < 0)
+ return ret;
+
+ switch (*endptr) {
+ case 'K':
+ case 'k':
+ if (!(suffixes & SUFFIX_K))
+ return -EINVAL;
+ shift = 10;
+ break;
+ case 'M':
+ case 'm':
+ if (!(suffixes & SUFFIX_M))
+ return -EINVAL;
+ shift = 20;
+ break;
+ case 'G':
+ case 'g':
+ if (!(suffixes & SUFFIX_G))
+ return -EINVAL;
+ shift = 30;
+ break;
+ case 'T':
+ case 't':
+ if (!(suffixes & SUFFIX_T))
+ return -EINVAL;
+ shift = 40;
+ break;
+ case 'P':
+ case 'p':
+ if (!(suffixes & SUFFIX_P))
+ return -EINVAL;
+ shift = 50;
+ break;
+ case 'E':
+ case 'e':
+ if (!(suffixes & SUFFIX_E))
+ return -EINVAL;
+ shift = 60;
+ break;
+ }
+ if (shift)
+ endptr++;
+ if (*endptr == '\n')
+ endptr++;
+ if (*endptr)
+ return -EINVAL;
+
+ /*
+ * Overflow check.
+ *
+ * Check @shift before doing right shift, as if right shift bit is
+ * greater than or equal to the number of bits, the result is undefined.
+ */
+ if (shift && value >> (64 - shift))
+ return -EOVERFLOW;
+ value <<= shift;
+ *res = value;
+ return 0;
+}
+EXPORT_SYMBOL(kstrtoull_suffix);
+
/**
* kstrtoll - convert a string to a long long
* @s: The start of the string. The string must be null-terminated, and may also
@@ -159,7 +257,7 @@ int kstrtoll(const char *s, unsigned int base, long long *res)
int rv;

if (s[0] == '-') {
- rv = _kstrtoull(s + 1, base, &tmp);
+ rv = _kstrtoull(s + 1, base, &tmp, NULL);
if (rv < 0)
return rv;
if ((long long)-tmp > 0)
--
2.43.0



2023-12-20 05:39:25

by David Disseldorp

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] lib/strtox: introduce kstrtoull_suffix() helper

On Wed, 20 Dec 2023 10:40:00 +1030, Qu Wenruo wrote:

> Just as mentioned in the comment of memparse(), the simple_stroull()
> usage can lead to overflow all by itself.
>
> Furthermore, the suffix calculation is also super overflow prone because
> that some suffix like "E" itself would eat 60bits, leaving only 4 bits
> available.
>
> And that suffix "E" can also lead to confusion since it's using the same
> char of hex Ox'E'.
>
> One simple example to expose all the problem is to use memparse() on
> "25E".
> The correct value should be 28823037615171174400, but the suffix E makes
> it super simple to overflow, resulting the incorrect value
> 10376293541461622784 (9E).
>
> So here we introduce a new helper to address the problem,
> kstrtoull_suffix():
>
> - Enhance _kstrtoull()
> This allow _kstrtoull() to return even if it hits an invalid char, as
> long as the optional parameter @retptr is provided.
>
> If @retptr is provided, _kstrtoull() would try its best to parse the
> valid part, and leave the remaining to be handled by the caller.
>
> If @retptr is not provided, the behavior is not altered.
>
> - New kstrtoull_suffix() helper
> This new helper utilize the new @retptr capability of _kstrtoull(),
> and provides 2 new ability:
>
> * Allow certain suffixes to be chosen
> The recommended suffix list is "KMGTP" (using the new unit_suffix
> enum as a bitmap), excluding the overflow prone "E".
> Undermost cases there is really no need to use "E" suffix anyway.
> And for those who really need that exabytes suffix, they can enable
> that suffix pretty easily.
>
> * Add overflow checks for the suffixes
> If the original number string is fine, but with the extra left
> shift overflow happens, then -EOVERFLOW is returned.
>
> Cc: Andrew Morton <[email protected]>
> Cc: Christophe JAILLET <[email protected]>
> Cc: Andy Shevchenko <[email protected]>
> Cc: David Laight <[email protected]>
> Cc: [email protected]
> Signed-off-by: Qu Wenruo <[email protected]>
>
> ----
> Changelog:
> v2:
> - Use enum bitmap to describe the suffixes
> This gets rid of the upper/lower case problem, and enum makes it
> a little more readable.
>
> - Fix the suffix overflow detection
>
> - Move the left shift out of the switch block
>
> - Remove the "done" tag
> Since no tailing character can already be handled properly.

nit: git am puts this changelog in the commit message when applied.
Please use `git send-email --annotate` and put it next to the diffstat,
so that it gets discarded.

> ---
> include/linux/kstrtox.h | 20 ++++++++
> lib/kstrtox.c | 108 ++++++++++++++++++++++++++++++++++++++--
> 2 files changed, 123 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/kstrtox.h b/include/linux/kstrtox.h
> index 7fcf29a4e0de..edac52d18a8e 100644
> --- a/include/linux/kstrtox.h
> +++ b/include/linux/kstrtox.h
> @@ -9,6 +9,26 @@
> int __must_check _kstrtoul(const char *s, unsigned int base, unsigned long *res);
> int __must_check _kstrtol(const char *s, unsigned int base, long *res);
>
> +/*
> + * The default suffix list would not include "E" since it's too easy to overflow
> + * and not much real world usage.
> + */
> +enum unit_suffix {
> + SUFFIX_K = (1 << 0),
> + SUFFIX_M = (1 << 1),
> + SUFFIX_G = (1 << 2),
> + SUFFIX_T = (1 << 3),
> + SUFFIX_P = (1 << 4),
> + SUFFIX_E = (1 << 5),
> +};
> +
> +/*
> + * The default suffix list would not include "E" since it's too easy to overflow
> + * and not much real world usage.
> + */

^ this comment is a duplicate.

> +#define KSTRTOULL_SUFFIX_DEFAULT (SUFFIX_K | SUFFIX_M | SUFFIX_G | SUFFIX_T | SUFFIX_P)

I think it'd be clearer if you dropped this default and had callers
explicitly provide the desired suffix mask.

> +int kstrtoull_suffix(const char *s, unsigned int base, unsigned long long *res,
> + enum unit_suffix suffixes);

__must_check would be consistent with the other helpers...

> int __must_check kstrtoull(const char *s, unsigned int base, unsigned long long *res);
> int __must_check kstrtoll(const char *s, unsigned int base, long long *res);
>
> diff --git a/lib/kstrtox.c b/lib/kstrtox.c
> index d586e6af5e5a..8a2fdd1e3376 100644
> --- a/lib/kstrtox.c
> +++ b/lib/kstrtox.c
> @@ -93,7 +93,8 @@ unsigned int _parse_integer(const char *s, unsigned int base, unsigned long long
> return _parse_integer_limit(s, base, p, INT_MAX);
> }
>
> -static int _kstrtoull(const char *s, unsigned int base, unsigned long long *res)
> +static int _kstrtoull(const char *s, unsigned int base, unsigned long long *res,
> + char **retptr)
> {
> unsigned long long _res;
> unsigned int rv;
> @@ -105,11 +106,19 @@ static int _kstrtoull(const char *s, unsigned int base, unsigned long long *res)
> if (rv == 0)
> return -EINVAL;
> s += rv;
> - if (*s == '\n')
> +
> + /*
> + * If @retptr is provided, caller is responsible to detect
> + * the extra chars, otherwise we can skip one newline.
> + */
> + if (!retptr && *s == '\n')
> s++;
> - if (*s)
> + if (!retptr && *s)
> return -EINVAL;
> +
> *res = _res;
> + if (retptr)
> + *retptr = (char *)s;
> return 0;
> }
>
> @@ -133,10 +142,99 @@ int kstrtoull(const char *s, unsigned int base, unsigned long long *res)
> {
> if (s[0] == '+')
> s++;
> - return _kstrtoull(s, base, res);
> + return _kstrtoull(s, base, res, NULL);
> }
> EXPORT_SYMBOL(kstrtoull);
>
> +/**
> + * kstrtoull_suffix - convert a string to ull with suffixes support
> + * @s: The start of the string. The string must be null-terminated, and may also
> + * include a single newline before its terminating null.
> + * @base: The number base to use. The maximum supported base is 16. If base is
> + * given as 0, then the base of the string is automatically detected with the
> + * conventional semantics - If it begins with 0x the number will be parsed as a
> + * hexadecimal (case insensitive), if it otherwise begins with 0, it will be
> + * parsed as an octal number. Otherwise it will be parsed as a decimal.
> + * @res: Where to write the result of the conversion on success.
> + * @suffixes: bitmap of acceptable suffixes, unknown bits would be ignored.
> + *
> + * Return 0 on success.
> + *
> + * Return -ERANGE on overflow or -EINVAL if invalid chars found.
> + * Return value must be checked.
> + */
> +int kstrtoull_suffix(const char *s, unsigned int base, unsigned long long *res,
> + enum unit_suffix suffixes)
> +{
> + unsigned long long value;
> + int shift = 0;
> + char *endptr;
> + int ret;
> +
> + ret = _kstrtoull(s, base, &value, &endptr);
> + /* Either already overflow or no number string at all. */
> + if (ret < 0)
> + return ret;
> +
> + switch (*endptr) {
> + case 'K':
> + case 'k':
> + if (!(suffixes & SUFFIX_K))
> + return -EINVAL;
> + shift = 10;
> + break;
> + case 'M':
> + case 'm':
> + if (!(suffixes & SUFFIX_M))
> + return -EINVAL;
> + shift = 20;
> + break;
> + case 'G':
> + case 'g':
> + if (!(suffixes & SUFFIX_G))
> + return -EINVAL;
> + shift = 30;
> + break;
> + case 'T':
> + case 't':
> + if (!(suffixes & SUFFIX_T))
> + return -EINVAL;
> + shift = 40;
> + break;
> + case 'P':
> + case 'p':
> + if (!(suffixes & SUFFIX_P))
> + return -EINVAL;
> + shift = 50;
> + break;
> + case 'E':
> + case 'e':
> + if (!(suffixes & SUFFIX_E))
> + return -EINVAL;
> + shift = 60;
> + break;
> + }
> + if (shift)
> + endptr++;
> + if (*endptr == '\n')
> + endptr++;
> + if (*endptr)
> + return -EINVAL;
> +
> + /*
> + * Overflow check.
> + *
> + * Check @shift before doing right shift, as if right shift bit is
> + * greater than or equal to the number of bits, the result is undefined.
> + */
> + if (shift && value >> (64 - shift))
> + return -EOVERFLOW;
> + value <<= shift;

Might as well roll this logic in with the existing "if (shift)"
above, e.g.
if (shift) {
/*
* Overflow check...
*/
if (value >> (64 - shift))
return -EOVERFLOW;
value <<= shift;
endptr++;
}
if (*endptr == '\n')
...

> + *res = value;
> + return 0;
> +}
> +EXPORT_SYMBOL(kstrtoull_suffix);
> +
> /**
> * kstrtoll - convert a string to a long long
> * @s: The start of the string. The string must be null-terminated, and may also
> @@ -159,7 +257,7 @@ int kstrtoll(const char *s, unsigned int base, long long *res)
> int rv;
>
> if (s[0] == '-') {
> - rv = _kstrtoull(s + 1, base, &tmp);
> + rv = _kstrtoull(s + 1, base, &tmp, NULL);
> if (rv < 0)
> return rv;
> if ((long long)-tmp > 0)


With the above changes made, feel free to add
Reviewed-by: David Disseldorp <[email protected]>

I'll leave the review of patch 2/2 up to others, as I'm still a little
worried about sysfs trailing whitespace regressions.

2023-12-20 14:20:29

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] lib/strtox: introduce kstrtoull_suffix() helper

On Wed, Dec 20, 2023 at 10:40:00AM +1030, Qu Wenruo wrote:
> Just as mentioned in the comment of memparse(), the simple_stroull()
> usage can lead to overflow all by itself.
>
> Furthermore, the suffix calculation is also super overflow prone because
> that some suffix like "E" itself would eat 60bits, leaving only 4 bits
> available.
>
> And that suffix "E" can also lead to confusion since it's using the same
> char of hex Ox'E'.
>
> One simple example to expose all the problem is to use memparse() on
> "25E".
> The correct value should be 28823037615171174400, but the suffix E makes
> it super simple to overflow, resulting the incorrect value
> 10376293541461622784 (9E).
>
> So here we introduce a new helper to address the problem,
> kstrtoull_suffix():
>
> - Enhance _kstrtoull()
> This allow _kstrtoull() to return even if it hits an invalid char, as
> long as the optional parameter @retptr is provided.
>
> If @retptr is provided, _kstrtoull() would try its best to parse the
> valid part, and leave the remaining to be handled by the caller.
>
> If @retptr is not provided, the behavior is not altered.
>
> - New kstrtoull_suffix() helper
> This new helper utilize the new @retptr capability of _kstrtoull(),
> and provides 2 new ability:
>
> * Allow certain suffixes to be chosen
> The recommended suffix list is "KMGTP" (using the new unit_suffix
> enum as a bitmap), excluding the overflow prone "E".
> Undermost cases there is really no need to use "E" suffix anyway.
> And for those who really need that exabytes suffix, they can enable
> that suffix pretty easily.
>
> * Add overflow checks for the suffixes
> If the original number string is fine, but with the extra left
> shift overflow happens, then -EOVERFLOW is returned.

NAK. Read the v1 discussion why.

--
With Best Regards,
Andy Shevchenko



2023-12-23 09:58:12

by Qu Wenruo

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] lib/strtox: introduce kstrtoull_suffix() helper



On 2023/12/20 16:08, David Disseldorp wrote:
> On Wed, 20 Dec 2023 10:40:00 +1030, Qu Wenruo wrote:
>
[...]
>>
>> ----
>> Changelog:
>> v2:
>> - Use enum bitmap to describe the suffixes
>> This gets rid of the upper/lower case problem, and enum makes it
>> a little more readable.
>>
>> - Fix the suffix overflow detection
>>
>> - Move the left shift out of the switch block
>>
>> - Remove the "done" tag
>> Since no tailing character can already be handled properly.
>
> nit: git am puts this changelog in the commit message when applied.
> Please use `git send-email --annotate` and put it next to the diffstat,
> so that it gets discarded.

Got it.

[...]
>> +};
>> +
>> +/*
>> + * The default suffix list would not include "E" since it's too easy to overflow
>> + * and not much real world usage.
>> + */
>
> ^ this comment is a duplicate.
>
>> +#define KSTRTOULL_SUFFIX_DEFAULT (SUFFIX_K | SUFFIX_M | SUFFIX_G | SUFFIX_T | SUFFIX_P)
>
> I think it'd be clearer if you dropped this default and had callers
> explicitly provide the desired suffix mask.

Well, that would be long, and would be even longer as the newer naming
would be MEMPARSE_SUFFIX_*, to be more explicit on what the suffix is for...

And I really want callers to choose a saner default suffix, thus here
comes the default one.

In fact, in my next version, I also found that there are some memparse()
call sites benefits from the newer suffixes (although won't for the "E"
one).
The example is the call site setup_elfcorehdr(). Where the comment only
mentions KMG, but since memparse() silently added "PE" suffixes, maybe
on some mainframes we saved some time for one or two lucky admins.

[...]
>
>
> With the above changes made, feel free to add
> Reviewed-by: David Disseldorp <[email protected]>

Thanks for the review, but I'm afraid the newer version would be another
beast.

All the ommitted comments would be addressed a in new series.
>
> I'll leave the review of patch 2/2 up to others, as I'm still a little
> worried about sysfs trailing whitespace regressions.

That won't be a problem anymore, the new series would keep the old
@retptr behavior, thus for btrfs part it won't be changed at all.

Thanks,
Qu
>

2023-12-27 05:55:33

by David Disseldorp

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] lib/strtox: introduce kstrtoull_suffix() helper

On Sat, 23 Dec 2023 20:27:30 +1030, Qu Wenruo wrote:

> On 2023/12/20 16:08, David Disseldorp wrote:
...
> >> +#define KSTRTOULL_SUFFIX_DEFAULT (SUFFIX_K | SUFFIX_M | SUFFIX_G | SUFFIX_T | SUFFIX_P)
> >
> > I think it'd be clearer if you dropped this default and had callers
> > explicitly provide the desired suffix mask.
>
> Well, that would be long, and would be even longer as the newer naming
> would be MEMPARSE_SUFFIX_*, to be more explicit on what the suffix is for...
>
> And I really want callers to choose a saner default suffix, thus here
> comes the default one.
>
> In fact, in my next version, I also found that there are some memparse()
> call sites benefits from the newer suffixes (although won't for the "E"
> one).
> The example is the call site setup_elfcorehdr(). Where the comment only
> mentions KMG, but since memparse() silently added "PE" suffixes, maybe
> on some mainframes we saved some time for one or two lucky admins.

I think it's a sane default, my concern is that _DEFAULT says nothing
about supported units from the caller's perspective. Perhaps
MEMPARSE_SUFFIX_KMGTP or MEMPARSE_UNITS_KMGTP would be clearer.

...
> > With the above changes made, feel free to add
> > Reviewed-by: David Disseldorp <[email protected]>
>
> Thanks for the review, but I'm afraid the newer version would be another
> beast.
>
> All the ommitted comments would be addressed a in new series.
> >
> > I'll leave the review of patch 2/2 up to others, as I'm still a little
> > worried about sysfs trailing whitespace regressions.
>
> That won't be a problem anymore, the new series would keep the old
> @retptr behavior, thus for btrfs part it won't be changed at all.

Sounds good. Will follow up there.

Cheers, David