Two and a half years ago I sent out 3 patches and a title letter that
had this[1]:
At Linux Plumbers, Andy Lutomirski approached me to tell me that the
syscall_get_arguments() implementation in x86 was horrible and gcc
certainly gets it wrong. He said that since the tracepoints only pass
in 0 and 6 for i and n repectively, it should be optimized for that case.
Inspecting the kernel, I discovered that all users pass in 0 for i and
only one file passing in something other than 6 for the number of arguments.
That code happens to be my own code used for the special syscall tracing.
That can easily be converted to just using 0 and 6 as well, and only copying
what is needed. Which is probably the faster path anyway for that case.
I haven't run the numbers (I can do that when I get some time), but since
pretty much all use cases use 0 and 6 and that would allow these functions
not to need strange logic to handle odd cases, I think this is still a win.
It received positive comments but also Linus asked to remove the separate
arg pointers and replace them with a single structure and fill that
instead. But for some reason, this got pushed aside and forgotten (probably,
had to do with the fact that I left Red Hat shortly after this).
Recently, it was brought back up again[2] and I decided to dust off these
patches and resubmit them. I also added one more patch to do the same
for syscall_set_arguments() that I did for syscall_get_arguments() even
though syscall_set_arguments() currently (and never has) had any callers.
But we are told that in the near future it may have one.
The changes do optimize the logic a little, but for most archs I just kept
the same logic (loops and such) as I don't have a way to test it, and
didn't want to break the logic.
I added a new struct syscall_info that holds seccomp_data and also
includes a stack pointer (sp) field. I would change seccomp_data,
but because its in include/uapi/linux/seccomp.h I didn't want to
touch it and break userspace. Perhaps we could add the field at the
end, but I didn't want to chance it (unless others say its OK).
I ran these through zero-day-bot and compiled tested these changes for
all architectures except for csky which I do not have a cross compiler
for.
Note the following archs fail normal builds, but they fail the same
with these patches:
arc
h8300
parisc64
Note, you may notice that I have "(Red Hat)" as the author of the
first three patches (even though they are signed off by "(VMware)").
This is because those patches were originally written while I was
working for Red Hat. But as I forward ported them while working for
VMware, my signed-off-by reflects that.
[1] - https://lore.kernel.org/lkml/[email protected]/T/#u
[2] - https://lore.kernel.org/lkml/[email protected]/T/#u
Steven Rostedt (Red Hat) (3):
ptrace: Remove maxargs from task_current_syscall()
tracing/syscalls: Pass in hardcoded 6 into syscall_get_arguments()
syscalls: Remove start and number from syscall_get_arguments() args
Steven Rostedt (VMware) (1):
syscalls: Remove start and number from syscall_set_arguments() args
----
arch/arc/include/asm/syscall.h | 7 +-
arch/arm/include/asm/syscall.h | 47 ++---------
arch/arm64/include/asm/syscall.h | 46 ++---------
arch/c6x/include/asm/syscall.h | 79 ++++---------------
arch/csky/include/asm/syscall.h | 26 ++-----
arch/h8300/include/asm/syscall.h | 34 ++------
arch/hexagon/include/asm/syscall.h | 4 +-
arch/ia64/include/asm/syscall.h | 13 +---
arch/ia64/kernel/ptrace.c | 7 +-
arch/microblaze/include/asm/syscall.h | 8 +-
arch/mips/include/asm/syscall.h | 3 +-
arch/mips/kernel/ptrace.c | 2 +-
arch/nds32/include/asm/syscall.h | 62 +++------------
arch/nios2/include/asm/syscall.h | 84 ++++----------------
arch/openrisc/include/asm/syscall.h | 12 +--
arch/parisc/include/asm/syscall.h | 30 ++-----
arch/powerpc/include/asm/syscall.h | 15 ++--
arch/riscv/include/asm/syscall.h | 24 ++----
arch/s390/include/asm/syscall.h | 28 +++----
arch/sh/include/asm/syscall_32.h | 47 +++--------
arch/sh/include/asm/syscall_64.h | 8 +-
arch/sparc/include/asm/syscall.h | 11 ++-
arch/um/include/asm/syscall-generic.h | 78 +++----------------
arch/x86/include/asm/syscall.h | 142 ++++++++--------------------------
arch/xtensa/include/asm/syscall.h | 33 ++------
fs/proc/base.c | 17 ++--
include/asm-generic/syscall.h | 21 ++---
include/linux/ptrace.h | 11 ++-
include/trace/events/syscalls.h | 2 +-
kernel/seccomp.c | 2 +-
kernel/trace/trace_syscalls.c | 9 ++-
lib/syscall.c | 57 ++++++--------
32 files changed, 247 insertions(+), 722 deletions(-)
The whole series looks fine to me.
I still suspect that we should just remove the syscall_set_arguments()
thing entirely, but even without that, the cleanup of the calling
convention is at least an improvement.
Linus
On Fri, 29 Mar 2019 10:24:58 -0700
Linus Torvalds <[email protected]> wrote:
> The whole series looks fine to me.
Great! I may just send a pull request to you, after some fixes (see
below).
>
> I still suspect that we should just remove the syscall_set_arguments()
> thing entirely, but even without that, the cleanup of the calling
> convention is at least an improvement.
I'll keep it around for now, but this should go as a warning to Dmitry,
to get something using it soon, or they may be dropped.
Also, Dmitry found a few bugs with the current
syscall_set/get_arguments() on some of the archs (riscv and csky). Which
I'll add at the front of this series and update my changes to keep the
same logic.
Then I'll post a non RFC version.
-- Steve
On Fri, Mar 29, 2019 at 10:40 AM Steven Rostedt <[email protected]> wrote:
>
> I'll keep it around for now, but this should go as a warning to Dmitry,
> to get something using it soon, or they may be dropped.
I don't think _that_ is the argument.
Quite the reverse: nobody has ever used it, why have it around, and
much less try to hurry some new pointless user to use it?
The "get system call arguments" code at least can be used somewhat
generically for things like tracing and strace.
The "set system call arguments" can NOT.
Anybody who sets system call arguments had better intimately know the
details anyway, and any user code has to have any legacy ptrace
interface anyway for all but the newest kernels.
So I will just say "NO". No new stupid interface that doesn't have a
truly immensely convincing reason for it.
And definitely not hurried along by "nobody has ever usefully done
this before and now people are noticing how bad the interfaces are, so
let's cobble some sh*t together quickly".
Linus
On Fri, Mar 29, 2019 at 11:12:18AM -0700, Linus Torvalds wrote:
> On Fri, Mar 29, 2019 at 10:40 AM Steven Rostedt <[email protected]> wrote:
> >
> > I'll keep it around for now, but this should go as a warning to Dmitry,
> > to get something using it soon, or they may be dropped.
>
> I don't think _that_ is the argument.
>
> Quite the reverse: nobody has ever used it, why have it around, and
> much less try to hurry some new pointless user to use it?
>
> The "get system call arguments" code at least can be used somewhat
> generically for things like tracing and strace.
>
> The "set system call arguments" can NOT.
>
> Anybody who sets system call arguments had better intimately know the
> details anyway, and any user code has to have any legacy ptrace
> interface anyway for all but the newest kernels.
In strace we have a feature called system call tampering.
Initially limited to system call number and return code tampering,
it's being extended to tamper with system call arguments as well.
Currently it's implemented in strace using traditional
PTRACE_SETREGSET/PTRACE_SETREGS/PTRACE_POKEUSER interfaces.
These interfaces indeed require intimate knowledge of the target
architecture. Fortunately, strace already has this intimate knowledge,
but the corresponding code would be much more trivial if an
architecture-agnostic ptrace interface for setting syscall info
existed in the kernel.
I didn't plan to start the discussion about this new ptrace command
before PTRACE_GET_SYSCALL_INFO [1] finally landed into the kernel.
For us userspace people it takes a lot of time not only to get a new
kernel interface accepted, but even to reintroduce an old internal kernel
interface that was removed due to lack of users. For example, it took me
roughly 4 months to get a relatively simple partial revert of commit
5e937a9ae913 accepted into linux-next.
This was the reason why I asked to delay the removal of
syscall_set_arguments() until PTRACE_GET_SYSCALL_INFO
is merged into the kernel.
[1] https://lore.kernel.org/lkml/[email protected]/
--
ldv
On Fri, 29 Mar 2019 10:40:45 PDT (-0700), [email protected] wrote:
> On Fri, 29 Mar 2019 10:24:58 -0700
> Linus Torvalds <[email protected]> wrote:
>
>> The whole series looks fine to me.
>
> Great! I may just send a pull request to you, after some fixes (see
> below).
>
>>
>> I still suspect that we should just remove the syscall_set_arguments()
>> thing entirely, but even without that, the cleanup of the calling
>> convention is at least an improvement.
>
> I'll keep it around for now, but this should go as a warning to Dmitry,
> to get something using it soon, or they may be dropped.
>
> Also, Dmitry found a few bugs with the current
> syscall_set/get_arguments() on some of the archs (riscv and csky). Which
> I'll add at the front of this series and update my changes to keep the
> same logic.
Thanks. I'm happy to have you take the RISC-V fix through your tree.
>
> Then I'll post a non RFC version.
>
> -- Steve