Hi @ll,
both <https://www.kernel.org/doc/htmldocs/kernel-api/API-sscanf.html>
and <https://www.kernel.org/doc/htmldocs/kernel-api/API-vsscanf.html>
are rather terse and fail to specify the supported arguments and their
conversion specifiers/modifiers.
<https://www.kernel.org/doc/htmldocs/kernel-api/libc.html#id-1.4.3>
tells OTOH:
| The behaviour of these functions may vary slightly from those
| defined by ANSI, and these deviations are noted in the text.
There is but no text (see above) despite multiple deviations from
ANSI C
<https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/lib/vsprintf.c?h=v5.12>
| /* '%*[' not yet supported, invalid format */
...
| /*
| * Warning: This implementation of the '[' conversion specifier
| * deviates from its glibc counterpart in the following ways:
...
More deviations (just from reading the source):
1. no support for %p
2. no support for conversion modifiers j and t
3. no support for multibyte characters and strings, i.e. %<width>c
and %<width>s may split UTF-8 codepoints
4. accepts %[<width>]<modifier>[c|s], but ignores all conversion
modifiers
5. treats %<width><modifier>% (and combinations) as %%
6. accepts %<width><modifier>n (and combinations)
7. doesn't scan the input for %[...]n
8. uses simple_strto[u]l for the conversion modifier z, i.e. assigns
uint32_t to size_t, resulting in truncation
Is this intended?
If not: patch to fix 5. and 6. and simplify the qualifier handling
attached
Stefan Kanthak
It is so stupendously hard to use scanf() safely
the best thing is probably to just delete it :-)
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
On 04/05/2021 21.19, Stefan Kanthak wrote:
> Hi @ll,
>
> both <https://www.kernel.org/doc/htmldocs/kernel-api/API-sscanf.html>
> and <https://www.kernel.org/doc/htmldocs/kernel-api/API-vsscanf.html>
> are rather terse and fail to specify the supported arguments and their
> conversion specifiers/modifiers.
>
> <https://www.kernel.org/doc/htmldocs/kernel-api/libc.html#id-1.4.3>
> tells OTOH:
>
> | The behaviour of these functions may vary slightly from those
> | defined by ANSI, and these deviations are noted in the text.
>
> There is but no text (see above) despite multiple deviations from
> ANSI C
>
> <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/lib/vsprintf.c?h=v5.12>
>
> | /* '%*[' not yet supported, invalid format */
> ...
> | /*
> | * Warning: This implementation of the '[' conversion specifier
> | * deviates from its glibc counterpart in the following ways:
> ...
>
> More deviations (just from reading the source):
>
> 1. no support for %p
What on earth good would that do in the kernel?
> 2. no support for conversion modifiers j and t
Could be added, but do you have a user?
> 3. no support for multibyte characters and strings, i.e. %<width>c
> and %<width>s may split UTF-8 codepoints
So what? The kernel doesn't do a lot of text processing and wchar_t stuff.
> 4. accepts %[<width>]<modifier>[c|s], but ignores all conversion
> modifiers
Yeah, %ls is technically accepted and treated as %s, that's mostly for
ease of parsing it seems. Do you have a use case where you'd want wchar_ts?
> 5. treats %<width><modifier>% (and combinations) as %%
What would you expect it to do? Seems to be a non-issue, gcc flags that
nonsense just fine
vs.c: In function ?v?:
vs.c:5:18: warning: conversion lacks type at end of format [-Wformat=]
5 | x = sscanf(s, "%l% %d", &y);
| ^
vs.c:5:20: warning: unknown conversion type character ? ? in format
[-Wformat=]
5 | x = sscanf(s, "%l% %d", &y);
| ^
> 6. accepts %<width><modifier>n (and combinations)
Again, non-issue (warning: field width used with ?%n? gnu_scanf format)
> 7. doesn't scan the input for %[...]n
? What's that supposed to mean.
> 8. uses simple_strto[u]l for the conversion modifier z, i.e. assigns
> uint32_t to size_t, resulting in truncation
Where do you see uint32_t? The code is
val.u = qualifier != 'L' ?
simple_strtoul(str, &next, base) :
simple_strtoull(str, &next, base);
case 'z':
*va_arg(args, size_t *) = val.u;
break;
so the conversion is done with simple_strtoul which return "unsigned
long". And size_t is either a typedef for "unsigned long" or "unsigned
int", so yes, of course a truncation may happen, but if the value
actually fits in a size_t, it also fits in unsigned long (as returned
from simple_strtoul) and unsigned long long (as stored in val.u).
Rasmus
Rasmus Villemoes <[email protected]> wrote:
> On 04/05/2021 21.19, Stefan Kanthak wrote:
>> Hi @ll,
>>
>> both <https://www.kernel.org/doc/htmldocs/kernel-api/API-sscanf.html>
>> and <https://www.kernel.org/doc/htmldocs/kernel-api/API-vsscanf.html>
>> are rather terse and fail to specify the supported arguments and their
>> conversion specifiers/modifiers.
>>
>> <https://www.kernel.org/doc/htmldocs/kernel-api/libc.html#id-1.4.3>
>> tells OTOH:
>>
>> | The behaviour of these functions may vary slightly from those
>> | defined by ANSI, and these deviations are noted in the text.
>>
>> There is but no text (see above) despite multiple deviations from
>> ANSI C
>>
>> <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/lib/vsprintf.c?h=v5.12>
>>
>> | /* '%*[' not yet supported, invalid format */
>> ...
>> | /*
>> | * Warning: This implementation of the '[' conversion specifier
>> | * deviates from its glibc counterpart in the following ways:
>> ...
>>
>> More deviations (just from reading the source):
>>
>> 1. no support for %p
>
> What on earth good would that do in the kernel?
| The behaviour of these functions may vary slightly from those
| defined by ANSI, and these deviations are noted in the text.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> 2. no support for conversion modifiers j and t
>
> Could be added, but do you have a user?
Just fix your documentation.
>> 3. no support for multibyte characters and strings, i.e. %<width>c
>> and %<width>s may split UTF-8 codepoints
>
> So what?
It's a BUG!
> The kernel doesn't do a lot of text processing and wchar_t stuff.
Nobody will ever feed a UTF-8 string to the kernel?
>> 4. accepts %[<width>]<modifier>[c|s], but ignores all conversion
>> modifiers
>
> Yeah, %ls is technically accepted and treated as %s,
just like %Ls and %Hs and %hhs and %zs ... what the documentation
but fails to tell: just fix it.
> that's mostly for ease of parsing it seems. Do you have a use
> case where you'd want wchar_ts?
>> 5. treats %<width><modifier>% (and combinations) as %%
>
> What would you expect it to do?
See the patch: stop and return the number of converted items, like
an ANSI/ISO conformant scanf()
> Seems to be a non-issue, gcc flags that nonsense just fine
Nobody will ever feed a non-constant format string to [v]sscanf()?
>> 6. accepts %<width><modifier>n (and combinations)
>
> Again, non-issue (warning: field width used with ?%n? gnu_scanf format)
How does gnu_scanf() handle %0Ln etc.?
Does a warning stop compilation of the kernel?
See above: it's undocumented, and it's not flagged in calls with
non-constant format string.
>> 7. doesn't scan the input for %[...]n
>
> ? What's that supposed to mean.
Argh, my fault: should have been %*
>> 8. uses simple_strto[u]l for the conversion modifier z, i.e. assigns
>> uint32_t to size_t, resulting in truncation
>
> Where do you see uint32_t?
LLP64 vs. LP64, so my last point is invalid.
Stefan