LinuxLists.cc - callchain ABI change with commit 6cbc304f2f360

2020-05-06 03:41:54

Subject: callchain ABI change with commit 6cbc304f2f360

Hi,

I have received reports from users who have noticed a change of
behaviour caused by
commit:

6cbc304f2f360 ("perf/x86/intel: Fix unwind errors from PEBS entries (mk-II)")

When using PEBS sampling on Intel processors.

Doing simple profiling with:
$ perf record -g -e cycles:pp ...

Before:

1 1595951041120856 0x7f77f8 [0xe8]: PERF_RECORD_SAMPLE(IP, 0x4002):
795385/690513: 0x558aa66a9607 period: 10000019 addr: 0
... FP chain: nr:22
..... 0: fffffffffffffe00
..... 1: 0000558aa66a9607
..... 2: 0000558aa66a8751
..... 3: 0000558a984a3d4f

Entry 1: matches sampled IP 0x558aa66a9607.

After:

3 487420973381085 0x2f797c0 [0x90]: PERF_RECORD_SAMPLE(IP, 0x4002):
349591/146458: 0x559dcd2ef889 period: 10000019 addr: 0
... FP chain: nr:11
..... 0: fffffffffffffe00
..... 1: 0000559dcd2ef88b
..... 2: 0000559dcd19787d
..... 3: 0000559dcd1cf1be

entry 1 does not match sampled IP anymore.

Before the patch the kernel was stashing the sampled IP from PEBS into
the callchain. After the patch it is stashing the interrupted IP, thus
with the skid.

I am trying to understand whether this is an intentional change or not
for the IP.

It seems that stashing the interrupted IP would be more consistent across all
sampling modes, i.e., with and without PEBS. Entry 1: would always be
the interrupted IP.
The changelog talks about ORC unwinder being more happy this the
interrupted machine
state, but not about the ABI expectation here.
Could you clarify?
Thanks.

2020-05-06 11:39:30

by Peter Zijlstra

[permalink] [raw]

Subject: Re: callchain ABI change with commit 6cbc304f2f360

On Tue, May 05, 2020 at 08:37:40PM -0700, Stephane Eranian wrote:
> Hi,
>
> I have received reports from users who have noticed a change of
> behaviour caused by
> commit:
>
> 6cbc304f2f360 ("perf/x86/intel: Fix unwind errors from PEBS entries (mk-II)")
>
> When using PEBS sampling on Intel processors.
>
> Doing simple profiling with:
> $ perf record -g -e cycles:pp ...
>
> Before:
>
> 1 1595951041120856 0x7f77f8 [0xe8]: PERF_RECORD_SAMPLE(IP, 0x4002):
> 795385/690513: 0x558aa66a9607 period: 10000019 addr: 0
> ... FP chain: nr:22
> ..... 0: fffffffffffffe00
> ..... 1: 0000558aa66a9607
> ..... 2: 0000558aa66a8751
> ..... 3: 0000558a984a3d4f
>
> Entry 1: matches sampled IP 0x558aa66a9607.
>
> After:
>
> 3 487420973381085 0x2f797c0 [0x90]: PERF_RECORD_SAMPLE(IP, 0x4002):
> 349591/146458: 0x559dcd2ef889 period: 10000019 addr: 0
> ... FP chain: nr:11
> ..... 0: fffffffffffffe00
> ..... 1: 0000559dcd2ef88b
> ..... 2: 0000559dcd19787d
> ..... 3: 0000559dcd1cf1be
>
> entry 1 does not match sampled IP anymore.
>
> Before the patch the kernel was stashing the sampled IP from PEBS into
> the callchain. After the patch it is stashing the interrupted IP, thus
> with the skid.
>
> I am trying to understand whether this is an intentional change or not
> for the IP.
>
> It seems that stashing the interrupted IP would be more consistent across all
> sampling modes, i.e., with and without PEBS. Entry 1: would always be
> the interrupted IP.
> The changelog talks about ORC unwinder being more happy this the
> interrupted machine
> state, but not about the ABI expectation here.
> Could you clarify?

Intentional; fundamentally, we cannot unwind a stack that no longer
exists.

The PEBS record comes in after the fact, the stack at the time of record
is irretrievably gone. The only (and best) thing we can do is provide
the unwind at the interrupt.

Adding a previous IP on top of a later unwind gives a completely
insane/broken call-stacks.

2020-05-06 18:51:52

by Stephane Eranian

[permalink] [raw]

Subject: Re: callchain ABI change with commit 6cbc304f2f360

On Wed, May 6, 2020 at 4:37 AM Peter Zijlstra <[email protected]> wrote:
>
> On Tue, May 05, 2020 at 08:37:40PM -0700, Stephane Eranian wrote:
> > Hi,
> >
> > I have received reports from users who have noticed a change of
> > behaviour caused by
> > commit:
> >
> > 6cbc304f2f360 ("perf/x86/intel: Fix unwind errors from PEBS entries (mk-II)")
> >
> > When using PEBS sampling on Intel processors.
> >
> > Doing simple profiling with:
> > $ perf record -g -e cycles:pp ...
> >
> > Before:
> >
> > 1 1595951041120856 0x7f77f8 [0xe8]: PERF_RECORD_SAMPLE(IP, 0x4002):
> > 795385/690513: 0x558aa66a9607 period: 10000019 addr: 0
> > ... FP chain: nr:22
> > ..... 0: fffffffffffffe00
> > ..... 1: 0000558aa66a9607
> > ..... 2: 0000558aa66a8751
> > ..... 3: 0000558a984a3d4f
> >
> > Entry 1: matches sampled IP 0x558aa66a9607.
> >
> > After:
> >
> > 3 487420973381085 0x2f797c0 [0x90]: PERF_RECORD_SAMPLE(IP, 0x4002):
> > 349591/146458: 0x559dcd2ef889 period: 10000019 addr: 0
> > ... FP chain: nr:11
> > ..... 0: fffffffffffffe00
> > ..... 1: 0000559dcd2ef88b
> > ..... 2: 0000559dcd19787d
> > ..... 3: 0000559dcd1cf1be
> >
> > entry 1 does not match sampled IP anymore.
> >
> > Before the patch the kernel was stashing the sampled IP from PEBS into
> > the callchain. After the patch it is stashing the interrupted IP, thus
> > with the skid.
> >
> > I am trying to understand whether this is an intentional change or not
> > for the IP.
> >
> > It seems that stashing the interrupted IP would be more consistent across all
> > sampling modes, i.e., with and without PEBS. Entry 1: would always be
> > the interrupted IP.
> > The changelog talks about ORC unwinder being more happy this the
> > interrupted machine
> > state, but not about the ABI expectation here.
> > Could you clarify?
>
> Intentional; fundamentally, we cannot unwind a stack that no longer
> exists.
>
Ok, thanks for clarifying this.

> The PEBS record comes in after the fact, the stack at the time of record
> is irretrievably gone. The only (and best) thing we can do is provide
> the unwind at the interrupt.
>
The PEBS record is always at an IP BEFORE or EQUAL to the interrupted IP.
The stack at PEBS may be gone in case the PEBS sample was captured at the
epilogue of the function where sp/rbp are modified.

> Adding a previous IP on top of a later unwind gives a completely
> insane/broken call-stacks.

I agree that using the interrupted IP is the most reliable thing to do.

You can get the callstack at the PEBS sample with LBR callstack on Icelake
because PEBS can record LBR. I am hoping this works with the existing code.