2021-06-27 19:31:24

by Alejandro Colomar

[permalink] [raw]
Subject: [RFC] strcpys(): New function for copying strings safely

Hi,

I recently re-read the discussions about strscpy() and strlcpy().

https://lwn.net/Articles/612244/
https://sourceware.org/legacy-ml/libc-alpha/2000-08/msg00052.html
https://mafford.com/text/the-many-ways-to-copy-a-string-in-c/
https://lkml.org/lkml/2017/7/14/637
https://lwn.net/Articles/659214/

Arguments against strlcpy():

- Inefficiency (compared to memcpy())
- It doesn't report truncation, needing extra checks by the user
- It doesn't prevent overrunning the source string (requires a C-string
input)

Arguments in favor of strlcpy():

- Avoid code bloat
- Self-documenting code
- Avoid everybody rolling their own strcat() safe variant
- Prevent buffer overflows


I rolled my own strcpy safer functions some time ago, and after reading
those discussions, I decided to propose them for inclusion in glibc.

I think they address all of the arguments before (well, efficiency will
never be as good as memcpy(), I guess, but a good compiler, and -flto
can make it good enough).

It reports two kinds of errors: "hard" errors, where the string is
invalid, and "soft" errors, where the string is valid but truncated.
The method for reporting the errors is an 'int' return value, which is 0
for success, -1 for "hard" error, and 1 for "soft" error.

The value of written characters that strcpy() functions typically
return, is now moved to a pointer parameter (4th parameter).

The order of the parameters has been changed (compared to other strcpy()
typicall variants), according to a new principle of the C2x standard,
which says that "the size of an array appears before the array".
<http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2086.htm>

It doesn't require a C string in the input. It only reads up to the
size limit provided by the user.

I added the _np suffix to the function to mark it as non-portable
(similar to existing practice in glibc with non-standard functions).

I used 'ssize_t' instead of 'size_t', because I consider unsigned types
to be inherently unsafe. See
<https://google.github.io/styleguide/cppguide.html#Integer_Types>.

It is designed so that usage requires the minimum number of lines of
code for complete usage (including error handling checks):

[[
// When we already checked that 'size' is >= 1
// and truncation is not an issue:

strcpys_np(size, dest, src, NULL);

// When we ddidn't check the value of 'size',
// but truncation is still not an issue:

if (strcpys_np(size, dest, src, NULL) == -1)
goto handle_hard_error;

// When truncation is an issue:

if (strcpys_np(size, dest, src, NULL))
goto handle_all_errors;

// When truncation is an error,
// but it requires a different handling than a "hard" error:

status = strcpys_np(size, dest, src, NULL);
if (status == 1)
goto handle_truncation;
if (status)
goto handle_hard_error;
]]

After any of those samples of code, the string has been copied, and is a
valid C-string.


Here goes the code (strcpys_np() is defined in terms of strscpy_np(),
which similar to the known strscpy(), but with some of the improvements
mentioned above, such as using array parameters, and ssize_t):


[[

#include <string.h>
#include <sys/types.h>


[[gnu::nonnull]]
ssize_t strscpy_np(ssize_t size,
char dest[static restrict size],
const char src[static restrict size])
{
ssize_t len;

if (size <= 0)
return -1;

len = strnlen(src, size - 1);
memcpy(dest, src, len);
dest[len] = '\0';

return len;
}

[[gnu::nonnull(2, 3)]]
int strcpys_np(ssize_t size,
char dest[static restrict size],
const char src[static restrict size],
ssize_t *restrict len)
{
ssize_t l;

l = strscpy_np(size, dest, src);
if (len)
*len = l;

if (l == -1)
return -1;
if (l >= size)
return 1;
return 0;
}

]]

I may have introduced some bugs right now, as I adapted the code a bit
before sending, but I expect it to be free of any bugs known of the
existing str*cpy() interfaces.


What do you think about this function? Would you want to add it to glibc?


Thanks,

Alex



--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/


2021-06-27 20:04:57

by Alejandro Colomar

[permalink] [raw]
Subject: Re: [RFC] strcpys(): New function for copying strings safely

On 6/27/21 9:26 PM, Alejandro Colomar (man-pages) wrote:
>
> It is designed so that usage requires the minimum number of lines of
> code for complete usage (including error handling checks):
>
> [[
> // When we already checked that 'size' is >= 1
> // and truncation is not an issue:
>
> strcpys_np(size, dest, src, NULL);

Also, given how unlikely this case is, I have in my code:
`[[gnu::warn_unused_result]]`

I forgot to talk about it in the definition I sent. I would put that
attribute in the glibc definition, if this is added to glibc.

To ignore it, a simple cast of the result to `(void)` should be enough
(or a more complex macro, like `UNUSED(strcpys_np(...));`).
>
> [[
>
> #include <string.h>
> #include <sys/types.h>
>
>
> [[gnu::nonnull]]
> ssize_t strscpy_np(ssize_t size,
>                    char dest[static restrict size],
>                    const char src[static restrict size])
> {
>     ssize_t len;
>
>     if (size <= 0)
>         return -1;
>
>     len = strnlen(src, size - 1);
>     memcpy(dest, src, len);
>     dest[len] = '\0';
>
>     return len;
> }
>
> [[gnu::nonnull(2, 3)]]
[[gnu::warn_unused_result]]
> int strcpys_np(ssize_t size,
>                char dest[static restrict size],
>                const char src[static restrict size],
>                ssize_t *restrict len)
> {
>     ssize_t l;
>
>     l = strscpy_np(size, dest, src);
>     if (len)
>         *len = l;
>
>     if (l == -1)
>         return -1;
>     if (l >= size)
>         return 1;
>     return 0;
> }
>
> ]]

--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

2021-06-28 08:16:53

by David Laight

[permalink] [raw]
Subject: RE: [RFC] strcpys(): New function for copying strings safely

From: Alejandro Colomar
> Sent: 27 June 2021 20:47
>
> On 6/27/21 9:26 PM, Alejandro Colomar (man-pages) wrote:
> >
> > It is designed so that usage requires the minimum number of lines of
> > code for complete usage (including error handling checks):
> >
> > [[
> > // When we already checked that 'size' is >= 1
> > // and truncation is not an issue:
> >
> > strcpys_np(size, dest, src, NULL);
>
> Also, given how unlikely this case is, I have in my code:
> `[[gnu::warn_unused_result]]`

warn_unused_result is such a PITA it should never have been invented.
At least no one seems to have been stupid enough to apply it to fprintf()
(you need to do fflush() before looking for errors).

In most cases strcpy() is safe - but if the inputs are 'corrupt'
horrid things happen - so you need truncation for safety.
But you really may not care whether the data got truncated.

The other use is where you want a sequence of:
len *= str_copy(dest + len, dest_len - len, src);
But don't really want to test for overflow after each call.

In this case returning the buffer length (including the added
'\0' on truncation works quite nicely.
No chance of writing outside the buffer, safe truncation and
a simple 'len == dest_len' check for overflow.

OTOH there are already too many string copy functions.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2021-06-28 12:14:12

by Alejandro Colomar

[permalink] [raw]
Subject: Re: [RFC] strcpys(): New function for copying strings safely

On 6/28/21 2:00 PM, Alejandro Colomar (man-pages) wrote:
> l = strscat(n - l, dest + l, src2);

Hmm, that's a bug; I wrote it too fast.

Either

l += strscpy(n - l, dest + l, src2);

or

l = strscat(n - l, dest, src2);

should be used instead.

--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

2021-06-28 12:34:50

by Alejandro Colomar

[permalink] [raw]
Subject: Re: [RFC] strcpys(): New function for copying strings safely

Hello David,

On 6/28/21 10:15 AM, David Laight wrote:
> From: Alejandro Colomar
>> Sent: 27 June 2021 20:47
>>
>> On 6/27/21 9:26 PM, Alejandro Colomar (man-pages) wrote:
>>>
>>> It is designed so that usage requires the minimum number of lines of
>>> code for complete usage (including error handling checks):
>>>
>>> [[
>>> // When we already checked that 'size' is >= 1
>>> // and truncation is not an issue:
>>>
>>> strcpys_np(size, dest, src, NULL);
>>
>> Also, given how unlikely this case is, I have in my code:
>> `[[gnu::warn_unused_result]]`
>
> warn_unused_result is such a PITA it should never have been invented.
> At least no one seems to have been stupid enough to apply it to fprintf()
> (you need to do fflush() before looking for errors).

Okay. After some thought, I think removing it is better.

>
> In most cases strcpy() is safe - but if the inputs are 'corrupt'
> horrid things happen - so you need truncation for safety.
> But you really may not care whether the data got truncated.

Yes, in most cases, expecially with literal strings ("xxx") and known
buffers, strcpy() is completely safe.

And yes, if the input is untrusted (may be corrupt or not), you need to
be prepared for truncation. You don't need to truncate, hey, maybe the
input is well-formed, but you need a function that may truncate if needed.

For that, we have strncpy(3), strlcpy(3bsd), and strscpy (kernel).

strncpy(3) is not for strings; it is for fixed length buffers. It has
high performance penalties, and yet doesn't guarantee a C-string.
DISCARDED.

strlcpy(3bsd) still requires that the input is a C-string, so it can't
be used with untrusted strings. It really is broken, as Linus said.
Its only valid use is if the input string is known (maybe a literal),
but the output buffer may be smaller than the input string. DISCARDED.

strscpy (kernel) is great. BUT it is not supported by glibc. The
reason for not including it (actually Ulrich used it for not including
strlcpy, but it also applies to strscpy) is that it doesn't report an
error when the string is truncated. I think that strscpy adds value to
the current strcpy variants, and should be added to glibc.

And then, coming back to truncation, yes, in some cases, truncation is
wanted, and not an error, for which strscpy is great (so please add it
to glibc :), but then there are other cases where truncation is not
wanted, but yet we deal with untrusted strings. That is the case that I
think Ulrich mentioned that wasn't covered by strlcpy (and isn't yet
covered by strscpy). And the reason to add this function.

Let's say a user introduces a username; it's an untrusted string, we
need to process it at least with strscpy(). But if it is corrupted (too
long), we don't want to silently store a truncated username; we want to
inform the user, so that he introduces a new (hopefully valid) username.
For that, strcpys() as defined in the previous email, is necessary.

>
> The other use is where you want a sequence of:
> len *= str_copy(dest + len, dest_len - len, src);
> But don't really want to test for overflow after each call.

This is a legitimate use of strscpy(). Please, add it to glibc, as
there is no libc function to do that.

>
> In this case returning the buffer length (including the added
> '\0' on truncation works quite nicely.
> No chance of writing outside the buffer, safe truncation and
> a simple 'len == dest_len' check for overflow.
>
> OTOH there are already too many string copy functions.

There are, but because the standard ones don't serve all purposes
correctly, so people need to develop their own. If libc provided the
necessary functions, less custom string copy functions would be
necessary, as Christoph said a long time ago, which is a good thing.

As I see it, there are the following, each of which has its valid usage:

strcpy(3):
known input && known buffer
strncpy(3):
not for C strings; only for fixed width buffers of characters!
strlcpy)3bsd):
known input && unknown buffer
Given that performance-wise it's similar to strscpy(),
it should probably always be replaced by strscpy().
strscpy():
unknown input && truncation is silently ignored
Except for performance-critical applications,
this call may replace strcpy(3), to add some extra safety.
strcpys():
unknown input && truncation may be an error (or not).
This call can replace strscpy() in most cases,
simplifying usage.
The only case I can see that strscpy() is superior
is for chains of strscpy(), where I'd only use strcpys()
in the last one (if any strscpy() in the chain has been
trunncated, so will the last strcpys() (unless it's "")).


For the reasons above, I propose both strscpy() and strcpys() for
inclusion in glibc. Both trivial sources are in the previous email.

Thanks,

Alex


---

BTW, I did some research in the kernel and glibc sources to see how are
str*cpy functions being used there:


user@sqli:~/src/gnu/glibc$ grep -rn 'strcpy(' * | sort -R | head -n 7
string/tester.c:1126: (void) strcpy(one, "abcd");
resolv/nss_dns/dns-host.c:532: strcpy(qp, "ip6.arpa");
dirent/tst-fdopendir.c:70: strcpy(fname, "/tmp/dXXXXXX");
string/tester.c:1006: cp = strcpy(one, "abc");
timezone/zic.c:459: return result ? strcpy(result, str) : result;
timezone/zic.c:2886: strcpy(startbuf, zp->z_format);
resolv/compat-gethnamaddr.c:393: strcpy(bp, qname);



- string/tester.c:1126: (void) strcpy(one, "abcd");
Okay.

- resolv/nss_dns/dns-host.c:532: strcpy(qp, "ip6.arpa");
Okay.

- dirent/tst-fdopendir.c:70: strcpy(fname, "/tmp/dXXXXXX");
Okay.

- string/tester.c:1006: cp = strcpy(one, "abc");
Okay.

- timezone/zic.c:459: return result ? strcpy(result, str) : result;
Okay.

- timezone/zic.c:2886: strcpy(startbuf, zp->z_format);
I couldn't find a guarantee that zp->z_format is NUL-terminated.
Possible buffer overrun.
I couldn't find a guarantee that zp->z_format fits in startbuf.
Possible buffer overflow.
This call could be replaced, at least, by strlcpy. The calls to strchr()
previous to that strcpy() may also benefit from using strnchr().

- resolv/compat-gethnamaddr.c:393: strcpy(bp, qname);
Okay.

user@sqli:~/src/linux/linux$ grep -rn 'strcpy(' * | sort -R | head -n 3
drivers/platform/x86/wmi.c:346: strcpy(method, "WQ");
sound/pci/atiixp.c:1305: strcpy(pcm->name, "ATI IXP IEC958 (AC97)");
sound/pci/intel8x0.c:1439: strcpy(name, "Intel ICH");

user@sqli:~/src/linux/linux$ grep -rn 'strlcpy(' * | sort -R | head -n 3
tools/perf/builtin-buildid-cache.c:124: strlcpy(from_dir, filename,
sizeof(from_dir));
drivers/usb/gadget/function/u_ether.c:148: strlcpy(p->version,
UETH__VERSION, sizeof(p->version));
drivers/scsi/qla4xxx/ql4_mbx.c:1615: strlcpy(username, chap_table->name,
QL4_CHAP_MAX_NAME_LEN);

user@sqli:~/src/linux/linux$ grep -rn 'strscpy(' * | sort -R | head -n 3
sound/core/pcm.c:732: strscpy(pcm->id, id, sizeof(pcm->id));
sound/usb/hiface/pcm.c:597: strscpy(pcm->name, "USB-SPDIF Audio",
sizeof(pcm->name));
crypto/crypto_user_stat.c:54: strscpy(rcipher.type, "cipher",
sizeof(rcipher.type));




- drivers/platform/x86/wmi.c:346: strcpy(method, "WQ");
Okay

- sound/pci/atiixp.c:1305: strcpy(pcm->name, "ATI IXP IEC958 (AC97)");
Okay

- sound/pci/intel8x0.c:1439: strcpy(name, "Intel ICH");
Okay
The sprintf() 2 lines above is also okay.


- tools/perf/builtin-buildid-cache.c:124: strlcpy(from_dir, filename,
sizeof(from_dir));
Okay. BUT no libc support!

- drivers/usb/gadget/function/u_ether.c:148: strlcpy(p->version,
UETH__VERSION, sizeof(p->version));
Okay. BUT no libc support!
However, in the next line, it's not obvious where dev->gadget->name
comes from.
eth_get_drvinfo() is a callback (it's stored in a pointer), so I can't
follow it.
If it were an untrusted string (not NUL-terminated), it would be a bug.

- drivers/scsi/qla4xxx/ql4_mbx.c:1615: strlcpy(username,
chap_table->name, QL4_CHAP_MAX_NAME_LEN);
Okay. BUT no libc support!


- sound/core/pcm.c:732: strscpy(pcm->id, id, sizeof(pcm->id));
Okay. BUT no libc support!

- sound/usb/hiface/pcm.c:597: strscpy(pcm->name, "USB-SPDIF Audio",
sizeof(pcm->name));
Okay. BUT no libc support!

- crypto/crypto_user_stat.c:54: strscpy(rcipher.type, "cipher",
sizeof(rcipher.type));
Okay. BUT no libc support!


None of the above checked for truncation, so I assume it's not an issue
there.



--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

2021-06-28 12:35:18

by Alejandro Colomar

[permalink] [raw]
Subject: Re: [RFC] strcpys(): New function for copying strings safely

On 6/28/21 1:32 PM, Alejandro Colomar (man-pages) wrote:
>>
>> The other use is where you want a sequence of:
>>     len *= str_copy(dest + len, dest_len - len, src);
>> But don't really want to test for overflow after each call.
>
> This is a legitimate use of strscpy().  Please, add it to glibc, as
> there is no libc function to do that.
>
>>
>> In this case returning the buffer length (including the added
>> '\0' on truncation works quite nicely.
>> No chance of writing outside the buffer, safe truncation and
>> a simple 'len == dest_len' check for overflow.
>>
>> OTOH there are already too many string copy functions.
>
> There are, but because the standard ones don't serve all purposes
> correctly, so people need to develop their own.  If libc provided the
> necessary functions, less custom string copy functions would be
> necessary, as Christoph said a long time ago, which is a good thing.
>
> As I see it, there are the following, each of which has its valid usage:
>
> strcpy(3):
>     known input && known buffer
> strncpy(3):
>     not for C strings; only for fixed width buffers of characters!
> strlcpy)3bsd):
>     known input && unknown buffer
>     Given that performance-wise it's similar to strscpy(),
>     it should probably always be replaced by strscpy().
> strscpy():
>     unknown input && truncation is silently ignored
>     Except for performance-critical applications,
>     this call may replace strcpy(3), to add some extra safety.
> strcpys():
>     unknown input && truncation may be an error (or not).
>     This call can replace strscpy() in most cases,
>     simplifying usage.
>     The only case I can see that strscpy() is superior
>     is for chains of strscpy(), where I'd only use strcpys()
>     in the last one (if any strscpy() in the chain has been
>     trunncated, so will the last strcpys() (unless it's "")).
>
>

BTW, for chains of str_cpy(), a strscat() and a strcats() function
should probably replace chained strcpy()s, so it would be something like:

l = strscpy(n, dest, src);
l = strscat(n - l, dest + l, src2);
l = strscat(n - l, dest + l, src3);
...
if (strcats(n - l, dest + l, srcN, NULL))
goto handle_truncation_or_error;

And the user should make sure that srcN is not "" unless he doesn't care
about truncation. Otherwise, check at every step.

I'll send my source code for strscat and strcats, if strscpy and strcpys
are considered for addition.


--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/