2016-04-08 21:04:23

by Andrew Kelley

[permalink] [raw]
Subject: alternatives to null-terminated byte arrays in syscalls in the future?

The open syscall looks like this:

SYSCALL_DEFINE3(open, const char __user *, filename, int, flags, umode_t, mode)

filename is a null terminated byte array. Null termination is one way
to handle lengths of byte arrays, but arguably a better way is to keep
track of the length in a separate field. Many programming languages
use pointer + length instead of null termination for various reasons.

When it's time to make a syscall such as open, software which does not
have a null character at the end of byte arrays are forced to allocate
memory, do a memcpy, insert a null byte, perform the open syscall,
then deallocate the memory.

What are the chances that in the future, Linux will have alternate
syscalls which accept byte array parameters where one can pass the
length of the byte array explicitly instead of using a null byte?

Regards,
Andrew Kelley


2016-04-08 21:11:14

by Denys Vlasenko

[permalink] [raw]
Subject: Re: alternatives to null-terminated byte arrays in syscalls in the future?

On Fri, Apr 8, 2016 at 11:04 PM, Andrew Kelley <[email protected]> wrote:
> The open syscall looks like this:
>
> SYSCALL_DEFINE3(open, const char __user *, filename, int, flags, umode_t, mode)
>
> filename is a null terminated byte array. Null termination is one way
> to handle lengths of byte arrays, but arguably a better way is to keep
> track of the length in a separate field. Many programming languages
> use pointer + length instead of null termination for various reasons.
>
> When it's time to make a syscall such as open, software which does not
> have a null character at the end of byte arrays are forced to allocate
> memory, do a memcpy, insert a null byte, perform the open syscall,
> then deallocate the memory.

In many cases, it's possible to just add the NUL byte instead.

> What are the chances that in the future, Linux will have alternate
> syscalls which accept byte array parameters where one can pass the
> length of the byte array explicitly instead of using a null byte?

0% chances. Amount of PITA to make that happen far outweighs
possible benefits.

2016-04-08 21:22:10

by Andrew Kelley

[permalink] [raw]
Subject: Re: alternatives to null-terminated byte arrays in syscalls in the future?

On Fri, Apr 8, 2016 at 2:10 PM, Denys Vlasenko <[email protected]> wrote:
> On Fri, Apr 8, 2016 at 11:04 PM, Andrew Kelley <[email protected]> wrote:
>> The open syscall looks like this:
>>
>> SYSCALL_DEFINE3(open, const char __user *, filename, int, flags, umode_t, mode)
>>
>> filename is a null terminated byte array. Null termination is one way
>> to handle lengths of byte arrays, but arguably a better way is to keep
>> track of the length in a separate field. Many programming languages
>> use pointer + length instead of null termination for various reasons.
>>
>> When it's time to make a syscall such as open, software which does not
>> have a null character at the end of byte arrays are forced to allocate
>> memory, do a memcpy, insert a null byte, perform the open syscall,
>> then deallocate the memory.
>
> In many cases, it's possible to just add the NUL byte instead.

Counter example, the Rust standard library:
https://github.com/rust-lang/rust/blob/7e996943784dcbabed433b6906510298ad80903b/src/libstd/sys/unix/fs.rs#L420-L423
https://github.com/rust-lang/rust/blob/7e996943784dcbabed433b6906510298ad80903b/src/libstd/sys/unix/fs.rs#L534-L536

The problem is that the open syscall is low level in a given
application so is usually abstracted in a way where having space to
add the NUL byte is not guaranteed, so implementations have to take
the safe bet of copying memory.

>
>> What are the chances that in the future, Linux will have alternate
>> syscalls which accept byte array parameters where one can pass the
>> length of the byte array explicitly instead of using a null byte?
>
> 0% chances. Amount of PITA to make that happen far outweighs
> possible benefits.

OK, fair enough. If I proposed a patch to the mailing list, would that
change the chances at all?

2016-04-09 12:37:52

by Alan Cox

[permalink] [raw]
Subject: Re: alternatives to null-terminated byte arrays in syscalls in the future?

On Fri, 8 Apr 2016 14:04:00 -0700
Andrew Kelley <[email protected]> wrote:

> The open syscall looks like this:
>
> SYSCALL_DEFINE3(open, const char __user *, filename, int, flags, umode_t, mode)
>
> filename is a null terminated byte array. Null termination is one way
> to handle lengths of byte arrays, but arguably a better way is to keep
> track of the length in a separate field. Many programming languages
> use pointer + length instead of null termination for various reasons.
>
> When it's time to make a syscall such as open, software which does not
> have a null character at the end of byte arrays are forced to allocate
> memory, do a memcpy, insert a null byte, perform the open syscall,
> then deallocate the memory.

That should only happen if the language wasn't carefully thought out. If
your name objects include both the length and the space available so you
can do array offset validation then

- you can check if the \0 will fit
- your app or interreter can add space for \0 or even include it
specifically

I would also be very surprised if most applications doing such
conversions even showed up meaningfully in the profiling. pathname
syscalls are not the most common ones being executed.

Alan