2024-04-04 19:58:31

by Beau Belgrave

[permalink] [raw]
Subject: Copying TLS/user register data per perf-sample?

Hello,

I'm looking into the possibility of capturing user data that is pointed
to by a user register (IE: fs/gs for TLS on x86/64) for each sample via
perf_events.

I was hoping to find a way to do this similar to PERF_SAMPLE_STACK_USER.
I think it could even use roughly the same ABI in the perf ring buffer.
Or it may be possible by some kprobe linked to the perf sample function.

This would allow a profiler to collect TLS (or other values) on x64. In
the Open Telemetry profiling SIG [1], we are trying to find a fast way
to grab a tracing association quickly on a per-thread basis. The team
at Elastic has a bespoke way to do this [2], however, I'd like to see a
more general way to achieve this. The folks I've been talking with seem
open to the idea of just having a TLS value for this we could capture
upon each sample. We could then just state, Open Telemetry SDKs should
have a TLS value for span correlation. However, we need a way to sample
the TLS value(s) when a sampling event is generated.

Is this already possible via some other means? It'd be great to be able
to do this directly at the perf_event sample via the ABI or a probe.

Thanks,
-Beau

1. https://opentelemetry.io/blog/2024/profiling/
2. https://www.elastic.co/blog/continuous-profiling-distributed-tracing-correlation


2024-04-09 23:33:11

by Namhyung Kim

[permalink] [raw]
Subject: Re: Copying TLS/user register data per perf-sample?

Hello,

On Thu, Apr 4, 2024 at 12:26 PM Beau Belgrave <[email protected]> wrote:
>
> Hello,
>
> I'm looking into the possibility of capturing user data that is pointed
> to by a user register (IE: fs/gs for TLS on x86/64) for each sample via
> perf_events.
>
> I was hoping to find a way to do this similar to PERF_SAMPLE_STACK_USER.
> I think it could even use roughly the same ABI in the perf ring buffer.
> Or it may be possible by some kprobe linked to the perf sample function.
>
> This would allow a profiler to collect TLS (or other values) on x64. In
> the Open Telemetry profiling SIG [1], we are trying to find a fast way
> to grab a tracing association quickly on a per-thread basis. The team
> at Elastic has a bespoke way to do this [2], however, I'd like to see a
> more general way to achieve this. The folks I've been talking with seem
> open to the idea of just having a TLS value for this we could capture
> upon each sample. We could then just state, Open Telemetry SDKs should
> have a TLS value for span correlation. However, we need a way to sample
> the TLS value(s) when a sampling event is generated.
>
> Is this already possible via some other means? It'd be great to be able
> to do this directly at the perf_event sample via the ABI or a probe.

I don't think the current perf ABI allows capturing %fs/%gs + offset.
IIRC kprobes/uprobes don't have that too but I could be wrong.

Thanks,
Namhyung

>
> 1. https://opentelemetry.io/blog/2024/profiling/
> 2. https://www.elastic.co/blog/continuous-profiling-distributed-tracing-correlation

2024-04-10 13:10:12

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: Copying TLS/user register data per perf-sample?

On Thu, 4 Apr 2024 12:26:41 -0700
Beau Belgrave <[email protected]> wrote:

> Hello,
>
> I'm looking into the possibility of capturing user data that is pointed
> to by a user register (IE: fs/gs for TLS on x86/64) for each sample via
> perf_events.
>
> I was hoping to find a way to do this similar to PERF_SAMPLE_STACK_USER.
> I think it could even use roughly the same ABI in the perf ring buffer.
> Or it may be possible by some kprobe linked to the perf sample function.
>
> This would allow a profiler to collect TLS (or other values) on x64. In
> the Open Telemetry profiling SIG [1], we are trying to find a fast way
> to grab a tracing association quickly on a per-thread basis. The team
> at Elastic has a bespoke way to do this [2], however, I'd like to see a
> more general way to achieve this. The folks I've been talking with seem
> open to the idea of just having a TLS value for this we could capture
> upon each sample. We could then just state, Open Telemetry SDKs should
> have a TLS value for span correlation. However, we need a way to sample
> the TLS value(s) when a sampling event is generated.
>
> Is this already possible via some other means? It'd be great to be able
> to do this directly at the perf_event sample via the ABI or a probe.
>

Have you tried to use uprobes? It should be able to access user-space
registers including fs/gs.

Thank you,

--
Masami Hiramatsu (Google) <[email protected]>

2024-04-10 15:38:57

by Beau Belgrave

[permalink] [raw]
Subject: Re: Copying TLS/user register data per perf-sample?

On Tue, Apr 09, 2024 at 04:32:46PM -0700, Namhyung Kim wrote:
> Hello,
>
> On Thu, Apr 4, 2024 at 12:26 PM Beau Belgrave <[email protected]> wrote:
> >
> > Hello,
> >
> > I'm looking into the possibility of capturing user data that is pointed
> > to by a user register (IE: fs/gs for TLS on x86/64) for each sample via
> > perf_events.
> >
> > I was hoping to find a way to do this similar to PERF_SAMPLE_STACK_USER.
> > I think it could even use roughly the same ABI in the perf ring buffer.
> > Or it may be possible by some kprobe linked to the perf sample function.
> >
> > This would allow a profiler to collect TLS (or other values) on x64. In
> > the Open Telemetry profiling SIG [1], we are trying to find a fast way
> > to grab a tracing association quickly on a per-thread basis. The team
> > at Elastic has a bespoke way to do this [2], however, I'd like to see a
> > more general way to achieve this. The folks I've been talking with seem
> > open to the idea of just having a TLS value for this we could capture
> > upon each sample. We could then just state, Open Telemetry SDKs should
> > have a TLS value for span correlation. However, we need a way to sample
> > the TLS value(s) when a sampling event is generated.
> >
> > Is this already possible via some other means? It'd be great to be able
> > to do this directly at the perf_event sample via the ABI or a probe.
>
> I don't think the current perf ABI allows capturing %fs/%gs + offset.
> IIRC kprobes/uprobes don't have that too but I could be wrong.
>

Yeah, I didn't see it either. I have some patches that I will submit in
a bit as RFC that enable this functionality. I was hoping there was
already an easy way to do this.

Thanks,
-Beau

> Thanks,
> Namhyung
>
> >
> > 1. https://opentelemetry.io/blog/2024/profiling/
> > 2. https://www.elastic.co/blog/continuous-profiling-distributed-tracing-correlation

2024-04-10 15:38:58

by Beau Belgrave

[permalink] [raw]
Subject: Re: Copying TLS/user register data per perf-sample?

On Wed, Apr 10, 2024 at 10:06:28PM +0900, Masami Hiramatsu wrote:
> On Thu, 4 Apr 2024 12:26:41 -0700
> Beau Belgrave <[email protected]> wrote:
>
> > Hello,
> >
> > I'm looking into the possibility of capturing user data that is pointed
> > to by a user register (IE: fs/gs for TLS on x86/64) for each sample via
> > perf_events.
> >
> > I was hoping to find a way to do this similar to PERF_SAMPLE_STACK_USER.
> > I think it could even use roughly the same ABI in the perf ring buffer.
> > Or it may be possible by some kprobe linked to the perf sample function.
> >
> > This would allow a profiler to collect TLS (or other values) on x64. In
> > the Open Telemetry profiling SIG [1], we are trying to find a fast way
> > to grab a tracing association quickly on a per-thread basis. The team
> > at Elastic has a bespoke way to do this [2], however, I'd like to see a
> > more general way to achieve this. The folks I've been talking with seem
> > open to the idea of just having a TLS value for this we could capture
> > upon each sample. We could then just state, Open Telemetry SDKs should
> > have a TLS value for span correlation. However, we need a way to sample
> > the TLS value(s) when a sampling event is generated.
> >
> > Is this already possible via some other means? It'd be great to be able
> > to do this directly at the perf_event sample via the ABI or a probe.
> >
>
> Have you tried to use uprobes? It should be able to access user-space
> registers including fs/gs.
>

We need to get fs/gs during a sample interrupt from perf. If the sample
interrupt lands during kernel code (IE: syscall) we would also like to
get these TLS values when in process context.

I have some patches into the kernel to make this possible via
perf_events that works well, however, I don't want to reinvent the wheel
if there is some way to get these via perf samples already.

In OTel, we are trying to attribute samples to transactions that are
occurring. So the TLS fetch has to be aligned exactly with the sample.
You can do this via eBPF when it's available, however, we have
environments where eBPF is not available.

It's sounding like to do this properly without eBPF a new feature would
be required. If so, I do have some patches I can share in a bit as an
RFC.

Thanks,
-Beau

> Thank you,
>
> --
> Masami Hiramatsu (Google) <[email protected]>

2024-04-11 17:24:48

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: Copying TLS/user register data per perf-sample?

On Wed, 10 Apr 2024 08:35:42 -0700
Beau Belgrave <[email protected]> wrote:

> On Wed, Apr 10, 2024 at 10:06:28PM +0900, Masami Hiramatsu wrote:
> > On Thu, 4 Apr 2024 12:26:41 -0700
> > Beau Belgrave <[email protected]> wrote:
> >
> > > Hello,
> > >
> > > I'm looking into the possibility of capturing user data that is pointed
> > > to by a user register (IE: fs/gs for TLS on x86/64) for each sample via
> > > perf_events.
> > >
> > > I was hoping to find a way to do this similar to PERF_SAMPLE_STACK_USER.
> > > I think it could even use roughly the same ABI in the perf ring buffer.
> > > Or it may be possible by some kprobe linked to the perf sample function.
> > >
> > > This would allow a profiler to collect TLS (or other values) on x64. In
> > > the Open Telemetry profiling SIG [1], we are trying to find a fast way
> > > to grab a tracing association quickly on a per-thread basis. The team
> > > at Elastic has a bespoke way to do this [2], however, I'd like to see a
> > > more general way to achieve this. The folks I've been talking with seem
> > > open to the idea of just having a TLS value for this we could capture
> > > upon each sample. We could then just state, Open Telemetry SDKs should
> > > have a TLS value for span correlation. However, we need a way to sample
> > > the TLS value(s) when a sampling event is generated.
> > >
> > > Is this already possible via some other means? It'd be great to be able
> > > to do this directly at the perf_event sample via the ABI or a probe.
> > >
> >
> > Have you tried to use uprobes? It should be able to access user-space
> > registers including fs/gs.
> >
>
> We need to get fs/gs during a sample interrupt from perf. If the sample
> interrupt lands during kernel code (IE: syscall) we would also like to
> get these TLS values when in process context.

OK, those are not directly accessible from pt_regs.

>
> I have some patches into the kernel to make this possible via
> perf_events that works well, however, I don't want to reinvent the wheel
> if there is some way to get these via perf samples already.

I would like to see it. I think it is possible to introduce a helper
to get a base address of user TLS for probe events, and start supporting
from x86.

>
> In OTel, we are trying to attribute samples to transactions that are
> occurring. So the TLS fetch has to be aligned exactly with the sample.
> You can do this via eBPF when it's available, however, we have
> environments where eBPF is not available.
>
> It's sounding like to do this properly without eBPF a new feature would
> be required. If so, I do have some patches I can share in a bit as an
> RFC.

It is better to be shared in RFC stage, so that we can discuss it from
the direction level.

Thank you,

>
> Thanks,
> -Beau
>
> > Thank you,
> >
> > --
> > Masami Hiramatsu (Google) <[email protected]>


--
Masami Hiramatsu (Google) <[email protected]>

2024-04-11 17:27:34

by Beau Belgrave

[permalink] [raw]
Subject: Re: Copying TLS/user register data per perf-sample?

On Fri, Apr 12, 2024 at 12:55:19AM +0900, Masami Hiramatsu wrote:
> On Wed, 10 Apr 2024 08:35:42 -0700
> Beau Belgrave <[email protected]> wrote:
>
> > On Wed, Apr 10, 2024 at 10:06:28PM +0900, Masami Hiramatsu wrote:
> > > On Thu, 4 Apr 2024 12:26:41 -0700
> > > Beau Belgrave <[email protected]> wrote:
> > >
> > > > Hello,
> > > >
> > > > I'm looking into the possibility of capturing user data that is pointed
> > > > to by a user register (IE: fs/gs for TLS on x86/64) for each sample via
> > > > perf_events.
> > > >
> > > > I was hoping to find a way to do this similar to PERF_SAMPLE_STACK_USER.
> > > > I think it could even use roughly the same ABI in the perf ring buffer.
> > > > Or it may be possible by some kprobe linked to the perf sample function.
> > > >
> > > > This would allow a profiler to collect TLS (or other values) on x64. In
> > > > the Open Telemetry profiling SIG [1], we are trying to find a fast way
> > > > to grab a tracing association quickly on a per-thread basis. The team
> > > > at Elastic has a bespoke way to do this [2], however, I'd like to see a
> > > > more general way to achieve this. The folks I've been talking with seem
> > > > open to the idea of just having a TLS value for this we could capture
> > > > upon each sample. We could then just state, Open Telemetry SDKs should
> > > > have a TLS value for span correlation. However, we need a way to sample
> > > > the TLS value(s) when a sampling event is generated.
> > > >
> > > > Is this already possible via some other means? It'd be great to be able
> > > > to do this directly at the perf_event sample via the ABI or a probe.
> > > >
> > >
> > > Have you tried to use uprobes? It should be able to access user-space
> > > registers including fs/gs.
> > >
> >
> > We need to get fs/gs during a sample interrupt from perf. If the sample
> > interrupt lands during kernel code (IE: syscall) we would also like to
> > get these TLS values when in process context.
>
> OK, those are not directly accessible from pt_regs.
>

Yeah, it's a per-arch thread attribute.

> >
> > I have some patches into the kernel to make this possible via
> > perf_events that works well, however, I don't want to reinvent the wheel
> > if there is some way to get these via perf samples already.
>
> I would like to see it. I think it is possible to introduce a helper
> to get a base address of user TLS for probe events, and start supporting
> from x86.
>

For sure, I'm hoping the patches start the right conversations.

> >
> > In OTel, we are trying to attribute samples to transactions that are
> > occurring. So the TLS fetch has to be aligned exactly with the sample.
> > You can do this via eBPF when it's available, however, we have
> > environments where eBPF is not available.
> >
> > It's sounding like to do this properly without eBPF a new feature would
> > be required. If so, I do have some patches I can share in a bit as an
> > RFC.
>
> It is better to be shared in RFC stage, so that we can discuss it from
> the direction level.
>

Agree, it could be that having the ability to run a probe on sample may
be a better option. Not sure.

Thanks,
-Beau

> Thank you,
>
> >
> > Thanks,
> > -Beau
> >
> > > Thank you,
> > >
> > > --
> > > Masami Hiramatsu (Google) <[email protected]>
>
>
> --
> Masami Hiramatsu (Google) <[email protected]>