2022-05-23 06:19:02

by Keno Fischer

[permalink] [raw]
Subject: Re: arm64 equivalents of PR_SET_TSC/ARCH_SET_CPUID

On Sun, May 22, 2022 at 11:35 AM Marc Zyngier <[email protected]> wrote:
> From what I understand, you are relying on the TSC being disabled in
> the tracee and intercepting the signal that gets delivered when it
> accesses the counter. Is that correct?

Yes, this is correct. The way that these kernel APIs work is that they turn
any use of `rdtsc` (respectively `cpuid`) into SIGSEGV signals that the
ptracer intercepts and emulates. It's not particularly pretty, but it
works reasonably well in practice.

> Assuming I'm right, I think it'd make a lot more sense if there was a
> first class ptrace option, if only because this would mandate the
> kernel to start trapping things that are not trapped today.

I'm a bit nervous about "first class ptrace option" if only because ptrace
is already a complicated mess and having spent a significant amount
of time hunting down architecture-specific ptrace quirks, I'd be quite
hesitant to introduce another one without a very strong justification.
If the proposed mechanism is essentially signal-equivalent
(i.e. it causes a ptrace stop and lets the ptracer emulate the instruction),
then I'd strongly advocate for making it an actual, proper signal which
has well-understood quirks (as the PR_SET_TSC/ARCH_SET_CPUID
do on x86).

The other consideration here is that disabling these sorts of counters
may have non-ptrace applications e.g. sandboxes may want to disable
these sorts of capabilities to harden against timing attacks, which
may suggest that ptrace isn't the right place for it.

If we're considering something more fancy, that's a different story of
course. Naturally causing a ptrace trap on these instructions has
significant overhead, but because they're usually fast, existing userspace
is not particularly judicious in their use (the same issue happens on x86
of course). One could imagine a light-weight kernel-level record/replay
capability where all accesses to these registers are traced and dumped
into a buffer (with the corresponding capability to feed the values from
a buffer). That kind of capability feels like a more natural fit for the perf
subsystem, which already has capabilities to shuffle trace buffers around.

> It also begs the question of the fate of CNTFRQ_EL0, since you want to
> be able to replay traces from one system to another (and the counter
> is meaningless without the frequency).

Yes, it'd have to be interceptable also.

> Finally, what of the VDSO, which is by far the most common user of the
> counter? I can totally imagine the VDSO getting stuck if emulation is
> used and the sequence counter moves synchronously with the traps
> (which is why we disable the VDSO when trapping CNTVCT_EL0).

Could you elaborate on this concern? rr does disable the vdso currently,
so it wouldn't be a problem from that perspective, but I don't understand
what you mean by the VDSO getting "stuck".

> > 2. Likewise for ARCH_SET_CPUID
>
> We don't just emulate a single register, but a whole class of them. If
> you are to present a different view for any of those, you'll need to
> handle the lot (I really can't see why one would be more important
> than the others).
>
> So SET_CPUID really is the wrong tool. I'd rather there was (again) an
> API that described exactly that.

I'm assuming these register values are all fixed as long as the process
doesn't get migrated between CPU cores? In that case, it seems quite
doable to introduce another ptrace regset that just has the register
values for everything that could potentially be emulated (and is extensible
for future additions). We'd need to think through the exact semantics
in the ordinary course if one of the emulated registers does change,
but it seems like a solvable issue.

Keno