2022-08-10 21:18:02

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH] perf/x86/intel/lbr: fix branch type encoding

With architected LBR, the procesosr can record the type of each sampled taken
branch. The type is encoded in 4-bit field in the LBR_INFO MSR of each entry.

The branch type must then extracted and saved in the perf_branch_entry in the
perf_events sampling buffer. With the current code, the raw Intel encoding of
the branch is exported to user tools. Yet tools, such as perf, expected the
branch type to be encoded using perf_events branch type enum
(see tools/perf/util/branch.c). As a result of the discrepancy, the output of
perf report -D shows bogus branch types.

Fix the problem by converting the Intel raw encoding into the perf_events
branch type enum values. With that in place and with no changes to the tools,
the branch types are now reported properly.

Signed-off-by: Stephane Eranian <[email protected]>
---
arch/x86/events/intel/lbr.c | 35 ++++++++++++++++++++++++++++++++---
1 file changed, 32 insertions(+), 3 deletions(-)

diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index 4f70fb6c2c1e..ef63d4d46b50 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -894,9 +894,23 @@ static DEFINE_STATIC_KEY_FALSE(x86_lbr_mispred);
static DEFINE_STATIC_KEY_FALSE(x86_lbr_cycles);
static DEFINE_STATIC_KEY_FALSE(x86_lbr_type);

-static __always_inline int get_lbr_br_type(u64 info)
+/*
+ * Array index encodes IA32_LBR_x_INFO Branch Type Encodings
+ * as per Intel SDM Vol3b Branch Types section
+ */
+static const int arch_lbr_type_map[]={
+ [0] = PERF_BR_COND,
+ [1] = PERF_BR_IND,
+ [2] = PERF_BR_UNCOND,
+ [3] = PERF_BR_IND_CALL,
+ [4] = PERF_BR_CALL,
+ [5] = PERF_BR_RET,
+};
+#define ARCH_LBR_TYPE_COUNT ARRAY_SIZE(arch_lbr_type_map)
+
+static __always_inline u16 get_lbr_br_type(u64 info)
{
- int type = 0;
+ u16 type = 0;

if (static_branch_likely(&x86_lbr_type))
type = (info & LBR_INFO_BR_TYPE) >> LBR_INFO_BR_TYPE_OFFSET;
@@ -904,6 +918,21 @@ static __always_inline int get_lbr_br_type(u64 info)
return type;
}

+/*
+ * The kernel cannot expose raw Intel branch type encodings because they are
+ * not generic. Instead, the function below maps the encoding to the
+ * perf_events user visible branch types.
+ */
+static __always_inline int get_lbr_br_type_mapping(u64 info)
+{
+ if (static_branch_likely(&x86_lbr_type)) {
+ u16 raw_type = get_lbr_br_type(info);
+ if (raw_type < ARCH_LBR_TYPE_COUNT)
+ return arch_lbr_type_map[raw_type];
+ }
+ return PERF_BR_UNKNOWN;
+}
+
static __always_inline bool get_lbr_mispred(u64 info)
{
bool mispred = 0;
@@ -957,7 +986,7 @@ static void intel_pmu_store_lbr(struct cpu_hw_events *cpuc,
e->in_tx = !!(info & LBR_INFO_IN_TX);
e->abort = !!(info & LBR_INFO_ABORT);
e->cycles = get_lbr_cycles(info);
- e->type = get_lbr_br_type(info);
+ e->type = get_lbr_br_type_mapping(info);
}

cpuc->lbr_stack.nr = i;
--
2.37.1.559.g78731f0fdb-goog


2022-08-11 12:33:49

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH] perf/x86/intel/lbr: fix branch type encoding



On 2022-08-10 5:06 p.m., Stephane Eranian wrote:
> With architected LBR, the procesosr can record the type of each sampled taken
> branch. The type is encoded in 4-bit field in the LBR_INFO MSR of each entry.
>
> The branch type must then extracted and saved in the perf_branch_entry in the
> perf_events sampling buffer. With the current code, the raw Intel encoding of
> the branch is exported to user tools.

In the intel_pmu_lbr_filter(), the raw encoding will be converted into
the X86_BR_* format via arch_lbr_br_type_map[]. Then the
common_branch_type() will convert the X86_BR_* format to the generic
PERF_BR_* type and expose to user tools.

I double check the existing arch_lbr_br_type_map[] and branch_map[].
They should generate the same PERF_BR_* type as your arch_lbr_type_map[].

Is there a test case which I can use to reproduce the problem?

Thanks,
Kan

> Yet tools, such as perf, expected the
> branch type to be encoded using perf_events branch type enum
> (see tools/perf/util/branch.c). As a result of the discrepancy, the output of
> perf report -D shows bogus branch types.
>
> Fix the problem by converting the Intel raw encoding into the perf_events
> branch type enum values. With that in place and with no changes to the tools,
> the branch types are now reported properly.
>
> Signed-off-by: Stephane Eranian <[email protected]>
> ---
> arch/x86/events/intel/lbr.c | 35 ++++++++++++++++++++++++++++++++---
> 1 file changed, 32 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
> index 4f70fb6c2c1e..ef63d4d46b50 100644
> --- a/arch/x86/events/intel/lbr.c
> +++ b/arch/x86/events/intel/lbr.c
> @@ -894,9 +894,23 @@ static DEFINE_STATIC_KEY_FALSE(x86_lbr_mispred);
> static DEFINE_STATIC_KEY_FALSE(x86_lbr_cycles);
> static DEFINE_STATIC_KEY_FALSE(x86_lbr_type);
>
> -static __always_inline int get_lbr_br_type(u64 info)
> +/*
> + * Array index encodes IA32_LBR_x_INFO Branch Type Encodings
> + * as per Intel SDM Vol3b Branch Types section
> + */
> +static const int arch_lbr_type_map[]={
> + [0] = PERF_BR_COND,
> + [1] = PERF_BR_IND,
> + [2] = PERF_BR_UNCOND,
> + [3] = PERF_BR_IND_CALL,
> + [4] = PERF_BR_CALL,
> + [5] = PERF_BR_RET,
> +};
> +#define ARCH_LBR_TYPE_COUNT ARRAY_SIZE(arch_lbr_type_map)
> +
> +static __always_inline u16 get_lbr_br_type(u64 info)
> {
> - int type = 0;
> + u16 type = 0;
>
> if (static_branch_likely(&x86_lbr_type))
> type = (info & LBR_INFO_BR_TYPE) >> LBR_INFO_BR_TYPE_OFFSET;
> @@ -904,6 +918,21 @@ static __always_inline int get_lbr_br_type(u64 info)
> return type;
> }
>
> +/*
> + * The kernel cannot expose raw Intel branch type encodings because they are
> + * not generic. Instead, the function below maps the encoding to the
> + * perf_events user visible branch types.
> + */
> +static __always_inline int get_lbr_br_type_mapping(u64 info)
> +{
> + if (static_branch_likely(&x86_lbr_type)) {
> + u16 raw_type = get_lbr_br_type(info);
> + if (raw_type < ARCH_LBR_TYPE_COUNT)
> + return arch_lbr_type_map[raw_type];
> + }
> + return PERF_BR_UNKNOWN;
> +}
> +
> static __always_inline bool get_lbr_mispred(u64 info)
> {
> bool mispred = 0;
> @@ -957,7 +986,7 @@ static void intel_pmu_store_lbr(struct cpu_hw_events *cpuc,
> e->in_tx = !!(info & LBR_INFO_IN_TX);
> e->abort = !!(info & LBR_INFO_ABORT);
> e->cycles = get_lbr_cycles(info);
> - e->type = get_lbr_br_type(info);
> + e->type = get_lbr_br_type_mapping(info);
> }
>
> cpuc->lbr_stack.nr = i;

2022-08-11 14:50:34

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH] perf/x86/intel/lbr: fix branch type encoding



On 2022-08-11 10:17 a.m., Stephane Eranian wrote:
> On Thu, Aug 11, 2022 at 3:23 PM Liang, Kan <[email protected]> wrote:
>>
>>
>>
>> On 2022-08-10 5:06 p.m., Stephane Eranian wrote:
>>> With architected LBR, the procesosr can record the type of each sampled taken
>>> branch. The type is encoded in 4-bit field in the LBR_INFO MSR of each entry.
>>>
>>> The branch type must then extracted and saved in the perf_branch_entry in the
>>> perf_events sampling buffer. With the current code, the raw Intel encoding of
>>> the branch is exported to user tools.
>>
>> In the intel_pmu_lbr_filter(), the raw encoding will be converted into
>> the X86_BR_* format via arch_lbr_br_type_map[]. Then the
>> common_branch_type() will convert the X86_BR_* format to the generic
>> PERF_BR_* type and expose to user tools.
>>
>> I double check the existing arch_lbr_br_type_map[] and branch_map[].
>> They should generate the same PERF_BR_* type as your arch_lbr_type_map[].
>>
>> Is there a test case which I can use to reproduce the problem?
>>
> I was doing a simple:
> $ perf record -b -e cpu/event=0xc4/ ....
> $ perf report -D
> Looking at the LBR information and the BR type, many entries has no branch type.
> What I see is a function where you do: e->type = get_lbr_br_type() and
> that is what
> is then saved in the buffer. Unless I am missing a later patch.
>

To get the LBR type, the save_type filter option must be applied. See
60f83fa6341d ("perf record: Create a new option save_type in
--branch-filter").

The -b only include the ANY option. Maybe we should extend the -b option
to ANY|SAVE_TYPE.

Thanks,
Kan

>
>> Thanks,
>> Kan
>>
>>> Yet tools, such as perf, expected the
>>> branch type to be encoded using perf_events branch type enum
>>> (see tools/perf/util/branch.c). As a result of the discrepancy, the output of
>>> perf report -D shows bogus branch types.
>>>
>>> Fix the problem by converting the Intel raw encoding into the perf_events
>>> branch type enum values. With that in place and with no changes to the tools,
>>> the branch types are now reported properly.
>>>
>>> Signed-off-by: Stephane Eranian <[email protected]>
>>> ---
>>> arch/x86/events/intel/lbr.c | 35 ++++++++++++++++++++++++++++++++---
>>> 1 file changed, 32 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
>>> index 4f70fb6c2c1e..ef63d4d46b50 100644
>>> --- a/arch/x86/events/intel/lbr.c
>>> +++ b/arch/x86/events/intel/lbr.c
>>> @@ -894,9 +894,23 @@ static DEFINE_STATIC_KEY_FALSE(x86_lbr_mispred);
>>> static DEFINE_STATIC_KEY_FALSE(x86_lbr_cycles);
>>> static DEFINE_STATIC_KEY_FALSE(x86_lbr_type);
>>>
>>> -static __always_inline int get_lbr_br_type(u64 info)
>>> +/*
>>> + * Array index encodes IA32_LBR_x_INFO Branch Type Encodings
>>> + * as per Intel SDM Vol3b Branch Types section
>>> + */
>>> +static const int arch_lbr_type_map[]={
>>> + [0] = PERF_BR_COND,
>>> + [1] = PERF_BR_IND,
>>> + [2] = PERF_BR_UNCOND,
>>> + [3] = PERF_BR_IND_CALL,
>>> + [4] = PERF_BR_CALL,
>>> + [5] = PERF_BR_RET,
>>> +};
>>> +#define ARCH_LBR_TYPE_COUNT ARRAY_SIZE(arch_lbr_type_map)
>>> +
>>> +static __always_inline u16 get_lbr_br_type(u64 info)
>>> {
>>> - int type = 0;
>>> + u16 type = 0;
>>>
>>> if (static_branch_likely(&x86_lbr_type))
>>> type = (info & LBR_INFO_BR_TYPE) >> LBR_INFO_BR_TYPE_OFFSET;
>>> @@ -904,6 +918,21 @@ static __always_inline int get_lbr_br_type(u64 info)
>>> return type;
>>> }
>>>
>>> +/*
>>> + * The kernel cannot expose raw Intel branch type encodings because they are
>>> + * not generic. Instead, the function below maps the encoding to the
>>> + * perf_events user visible branch types.
>>> + */
>>> +static __always_inline int get_lbr_br_type_mapping(u64 info)
>>> +{
>>> + if (static_branch_likely(&x86_lbr_type)) {
>>> + u16 raw_type = get_lbr_br_type(info);
>>> + if (raw_type < ARCH_LBR_TYPE_COUNT)
>>> + return arch_lbr_type_map[raw_type];
>>> + }
>>> + return PERF_BR_UNKNOWN;
>>> +}
>>> +
>>> static __always_inline bool get_lbr_mispred(u64 info)
>>> {
>>> bool mispred = 0;
>>> @@ -957,7 +986,7 @@ static void intel_pmu_store_lbr(struct cpu_hw_events *cpuc,
>>> e->in_tx = !!(info & LBR_INFO_IN_TX);
>>> e->abort = !!(info & LBR_INFO_ABORT);
>>> e->cycles = get_lbr_cycles(info);
>>> - e->type = get_lbr_br_type(info);
>>> + e->type = get_lbr_br_type_mapping(info);
>>> }
>>>
>>> cpuc->lbr_stack.nr = i;

2022-08-11 15:05:02

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH] perf/x86/intel/lbr: fix branch type encoding

On Thu, Aug 11, 2022 at 3:23 PM Liang, Kan <[email protected]> wrote:
>
>
>
> On 2022-08-10 5:06 p.m., Stephane Eranian wrote:
> > With architected LBR, the procesosr can record the type of each sampled taken
> > branch. The type is encoded in 4-bit field in the LBR_INFO MSR of each entry.
> >
> > The branch type must then extracted and saved in the perf_branch_entry in the
> > perf_events sampling buffer. With the current code, the raw Intel encoding of
> > the branch is exported to user tools.
>
> In the intel_pmu_lbr_filter(), the raw encoding will be converted into
> the X86_BR_* format via arch_lbr_br_type_map[]. Then the
> common_branch_type() will convert the X86_BR_* format to the generic
> PERF_BR_* type and expose to user tools.
>
> I double check the existing arch_lbr_br_type_map[] and branch_map[].
> They should generate the same PERF_BR_* type as your arch_lbr_type_map[].
>
> Is there a test case which I can use to reproduce the problem?
>
I was doing a simple:
$ perf record -b -e cpu/event=0xc4/ ....
$ perf report -D
Looking at the LBR information and the BR type, many entries has no branch type.
What I see is a function where you do: e->type = get_lbr_br_type() and
that is what
is then saved in the buffer. Unless I am missing a later patch.


> Thanks,
> Kan
>
> > Yet tools, such as perf, expected the
> > branch type to be encoded using perf_events branch type enum
> > (see tools/perf/util/branch.c). As a result of the discrepancy, the output of
> > perf report -D shows bogus branch types.
> >
> > Fix the problem by converting the Intel raw encoding into the perf_events
> > branch type enum values. With that in place and with no changes to the tools,
> > the branch types are now reported properly.
> >
> > Signed-off-by: Stephane Eranian <[email protected]>
> > ---
> > arch/x86/events/intel/lbr.c | 35 ++++++++++++++++++++++++++++++++---
> > 1 file changed, 32 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
> > index 4f70fb6c2c1e..ef63d4d46b50 100644
> > --- a/arch/x86/events/intel/lbr.c
> > +++ b/arch/x86/events/intel/lbr.c
> > @@ -894,9 +894,23 @@ static DEFINE_STATIC_KEY_FALSE(x86_lbr_mispred);
> > static DEFINE_STATIC_KEY_FALSE(x86_lbr_cycles);
> > static DEFINE_STATIC_KEY_FALSE(x86_lbr_type);
> >
> > -static __always_inline int get_lbr_br_type(u64 info)
> > +/*
> > + * Array index encodes IA32_LBR_x_INFO Branch Type Encodings
> > + * as per Intel SDM Vol3b Branch Types section
> > + */
> > +static const int arch_lbr_type_map[]={
> > + [0] = PERF_BR_COND,
> > + [1] = PERF_BR_IND,
> > + [2] = PERF_BR_UNCOND,
> > + [3] = PERF_BR_IND_CALL,
> > + [4] = PERF_BR_CALL,
> > + [5] = PERF_BR_RET,
> > +};
> > +#define ARCH_LBR_TYPE_COUNT ARRAY_SIZE(arch_lbr_type_map)
> > +
> > +static __always_inline u16 get_lbr_br_type(u64 info)
> > {
> > - int type = 0;
> > + u16 type = 0;
> >
> > if (static_branch_likely(&x86_lbr_type))
> > type = (info & LBR_INFO_BR_TYPE) >> LBR_INFO_BR_TYPE_OFFSET;
> > @@ -904,6 +918,21 @@ static __always_inline int get_lbr_br_type(u64 info)
> > return type;
> > }
> >
> > +/*
> > + * The kernel cannot expose raw Intel branch type encodings because they are
> > + * not generic. Instead, the function below maps the encoding to the
> > + * perf_events user visible branch types.
> > + */
> > +static __always_inline int get_lbr_br_type_mapping(u64 info)
> > +{
> > + if (static_branch_likely(&x86_lbr_type)) {
> > + u16 raw_type = get_lbr_br_type(info);
> > + if (raw_type < ARCH_LBR_TYPE_COUNT)
> > + return arch_lbr_type_map[raw_type];
> > + }
> > + return PERF_BR_UNKNOWN;
> > +}
> > +
> > static __always_inline bool get_lbr_mispred(u64 info)
> > {
> > bool mispred = 0;
> > @@ -957,7 +986,7 @@ static void intel_pmu_store_lbr(struct cpu_hw_events *cpuc,
> > e->in_tx = !!(info & LBR_INFO_IN_TX);
> > e->abort = !!(info & LBR_INFO_ABORT);
> > e->cycles = get_lbr_cycles(info);
> > - e->type = get_lbr_br_type(info);
> > + e->type = get_lbr_br_type_mapping(info);
> > }
> >
> > cpuc->lbr_stack.nr = i;

2022-08-11 15:48:54

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH] perf/x86/intel/lbr: fix branch type encoding

On Thu, Aug 11, 2022 at 5:42 PM Liang, Kan <[email protected]> wrote:
>
>
>
> On 2022-08-11 10:17 a.m., Stephane Eranian wrote:
> > On Thu, Aug 11, 2022 at 3:23 PM Liang, Kan <[email protected]> wrote:
> >>
> >>
> >>
> >> On 2022-08-10 5:06 p.m., Stephane Eranian wrote:
> >>> With architected LBR, the procesosr can record the type of each sampled taken
> >>> branch. The type is encoded in 4-bit field in the LBR_INFO MSR of each entry.
> >>>
> >>> The branch type must then extracted and saved in the perf_branch_entry in the
> >>> perf_events sampling buffer. With the current code, the raw Intel encoding of
> >>> the branch is exported to user tools.
> >>
> >> In the intel_pmu_lbr_filter(), the raw encoding will be converted into
> >> the X86_BR_* format via arch_lbr_br_type_map[]. Then the
> >> common_branch_type() will convert the X86_BR_* format to the generic
> >> PERF_BR_* type and expose to user tools.
> >>
> >> I double check the existing arch_lbr_br_type_map[] and branch_map[].
> >> They should generate the same PERF_BR_* type as your arch_lbr_type_map[].
> >>
> >> Is there a test case which I can use to reproduce the problem?
> >>
> > I was doing a simple:
> > $ perf record -b -e cpu/event=0xc4/ ....
> > $ perf report -D
> > Looking at the LBR information and the BR type, many entries has no branch type.
> > What I see is a function where you do: e->type = get_lbr_br_type() and
> > that is what
> > is then saved in the buffer. Unless I am missing a later patch.
> >
>
> To get the LBR type, the save_type filter option must be applied. See
> 60f83fa6341d ("perf record: Create a new option save_type in
> --branch-filter").
>
That seems overly complicated. I don't recall having to pass a new option
to get the LBR latency. It showed up automatically. So why for branch_type?

> The -b only include the ANY option. Maybe we should extend the -b option
> to ANY|SAVE_TYPE.
>
Ok, that explains it then. I think we need to simplify.


> Thanks,
> Kan
>
> >
> >> Thanks,
> >> Kan
> >>
> >>> Yet tools, such as perf, expected the
> >>> branch type to be encoded using perf_events branch type enum
> >>> (see tools/perf/util/branch.c). As a result of the discrepancy, the output of
> >>> perf report -D shows bogus branch types.
> >>>
> >>> Fix the problem by converting the Intel raw encoding into the perf_events
> >>> branch type enum values. With that in place and with no changes to the tools,
> >>> the branch types are now reported properly.
> >>>
> >>> Signed-off-by: Stephane Eranian <[email protected]>
> >>> ---
> >>> arch/x86/events/intel/lbr.c | 35 ++++++++++++++++++++++++++++++++---
> >>> 1 file changed, 32 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
> >>> index 4f70fb6c2c1e..ef63d4d46b50 100644
> >>> --- a/arch/x86/events/intel/lbr.c
> >>> +++ b/arch/x86/events/intel/lbr.c
> >>> @@ -894,9 +894,23 @@ static DEFINE_STATIC_KEY_FALSE(x86_lbr_mispred);
> >>> static DEFINE_STATIC_KEY_FALSE(x86_lbr_cycles);
> >>> static DEFINE_STATIC_KEY_FALSE(x86_lbr_type);
> >>>
> >>> -static __always_inline int get_lbr_br_type(u64 info)
> >>> +/*
> >>> + * Array index encodes IA32_LBR_x_INFO Branch Type Encodings
> >>> + * as per Intel SDM Vol3b Branch Types section
> >>> + */
> >>> +static const int arch_lbr_type_map[]={
> >>> + [0] = PERF_BR_COND,
> >>> + [1] = PERF_BR_IND,
> >>> + [2] = PERF_BR_UNCOND,
> >>> + [3] = PERF_BR_IND_CALL,
> >>> + [4] = PERF_BR_CALL,
> >>> + [5] = PERF_BR_RET,
> >>> +};
> >>> +#define ARCH_LBR_TYPE_COUNT ARRAY_SIZE(arch_lbr_type_map)
> >>> +
> >>> +static __always_inline u16 get_lbr_br_type(u64 info)
> >>> {
> >>> - int type = 0;
> >>> + u16 type = 0;
> >>>
> >>> if (static_branch_likely(&x86_lbr_type))
> >>> type = (info & LBR_INFO_BR_TYPE) >> LBR_INFO_BR_TYPE_OFFSET;
> >>> @@ -904,6 +918,21 @@ static __always_inline int get_lbr_br_type(u64 info)
> >>> return type;
> >>> }
> >>>
> >>> +/*
> >>> + * The kernel cannot expose raw Intel branch type encodings because they are
> >>> + * not generic. Instead, the function below maps the encoding to the
> >>> + * perf_events user visible branch types.
> >>> + */
> >>> +static __always_inline int get_lbr_br_type_mapping(u64 info)
> >>> +{
> >>> + if (static_branch_likely(&x86_lbr_type)) {
> >>> + u16 raw_type = get_lbr_br_type(info);
> >>> + if (raw_type < ARCH_LBR_TYPE_COUNT)
> >>> + return arch_lbr_type_map[raw_type];
> >>> + }
> >>> + return PERF_BR_UNKNOWN;
> >>> +}
> >>> +
> >>> static __always_inline bool get_lbr_mispred(u64 info)
> >>> {
> >>> bool mispred = 0;
> >>> @@ -957,7 +986,7 @@ static void intel_pmu_store_lbr(struct cpu_hw_events *cpuc,
> >>> e->in_tx = !!(info & LBR_INFO_IN_TX);
> >>> e->abort = !!(info & LBR_INFO_ABORT);
> >>> e->cycles = get_lbr_cycles(info);
> >>> - e->type = get_lbr_br_type(info);
> >>> + e->type = get_lbr_br_type_mapping(info);
> >>> }
> >>>
> >>> cpuc->lbr_stack.nr = i;

2022-08-11 16:15:56

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH] perf/x86/intel/lbr: fix branch type encoding

On Thu, Aug 11, 2022 at 6:28 PM Stephane Eranian <[email protected]> wrote:
>
> On Thu, Aug 11, 2022 at 5:42 PM Liang, Kan <[email protected]> wrote:
> >
> >
> >
> > On 2022-08-11 10:17 a.m., Stephane Eranian wrote:
> > > On Thu, Aug 11, 2022 at 3:23 PM Liang, Kan <[email protected]> wrote:
> > >>
> > >>
> > >>
> > >> On 2022-08-10 5:06 p.m., Stephane Eranian wrote:
> > >>> With architected LBR, the procesosr can record the type of each sampled taken
> > >>> branch. The type is encoded in 4-bit field in the LBR_INFO MSR of each entry.
> > >>>
> > >>> The branch type must then extracted and saved in the perf_branch_entry in the
> > >>> perf_events sampling buffer. With the current code, the raw Intel encoding of
> > >>> the branch is exported to user tools.
> > >>
> > >> In the intel_pmu_lbr_filter(), the raw encoding will be converted into
> > >> the X86_BR_* format via arch_lbr_br_type_map[]. Then the
> > >> common_branch_type() will convert the X86_BR_* format to the generic
> > >> PERF_BR_* type and expose to user tools.
> > >>
> > >> I double check the existing arch_lbr_br_type_map[] and branch_map[].
> > >> They should generate the same PERF_BR_* type as your arch_lbr_type_map[].
> > >>
> > >> Is there a test case which I can use to reproduce the problem?
> > >>
> > > I was doing a simple:
> > > $ perf record -b -e cpu/event=0xc4/ ....
> > > $ perf report -D
> > > Looking at the LBR information and the BR type, many entries has no branch type.
> > > What I see is a function where you do: e->type = get_lbr_br_type() and
> > > that is what
> > > is then saved in the buffer. Unless I am missing a later patch.
> > >
> >
> > To get the LBR type, the save_type filter option must be applied. See
> > 60f83fa6341d ("perf record: Create a new option save_type in
> > --branch-filter").
> >
> That seems overly complicated. I don't recall having to pass a new option
> to get the LBR latency. It showed up automatically. So why for branch_type?
>
> > The -b only include the ANY option. Maybe we should extend the -b option
> > to ANY|SAVE_TYPE.
> >
> Ok, that explains it then. I think we need to simplify.
>
In fact, I don't see a case where you would not benefit from the branch type.
Furthermore, not having the branch type DOES NOT save any space in the
branch record (given we have a reserved field). So I think I prefer not having
to specify yet another cmdline option to get the branch type. In fact, if you do
not pass the option, then perf report -D reports some bogus branch types, i.e.,
not all entries have empty types.


> > >
> > >> Thanks,
> > >> Kan
> > >>
> > >>> Yet tools, such as perf, expected the
> > >>> branch type to be encoded using perf_events branch type enum
> > >>> (see tools/perf/util/branch.c). As a result of the discrepancy, the output of
> > >>> perf report -D shows bogus branch types.
> > >>>
> > >>> Fix the problem by converting the Intel raw encoding into the perf_events
> > >>> branch type enum values. With that in place and with no changes to the tools,
> > >>> the branch types are now reported properly.
> > >>>
> > >>> Signed-off-by: Stephane Eranian <[email protected]>
> > >>> ---
> > >>> arch/x86/events/intel/lbr.c | 35 ++++++++++++++++++++++++++++++++---
> > >>> 1 file changed, 32 insertions(+), 3 deletions(-)
> > >>>
> > >>> diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
> > >>> index 4f70fb6c2c1e..ef63d4d46b50 100644
> > >>> --- a/arch/x86/events/intel/lbr.c
> > >>> +++ b/arch/x86/events/intel/lbr.c
> > >>> @@ -894,9 +894,23 @@ static DEFINE_STATIC_KEY_FALSE(x86_lbr_mispred);
> > >>> static DEFINE_STATIC_KEY_FALSE(x86_lbr_cycles);
> > >>> static DEFINE_STATIC_KEY_FALSE(x86_lbr_type);
> > >>>
> > >>> -static __always_inline int get_lbr_br_type(u64 info)
> > >>> +/*
> > >>> + * Array index encodes IA32_LBR_x_INFO Branch Type Encodings
> > >>> + * as per Intel SDM Vol3b Branch Types section
> > >>> + */
> > >>> +static const int arch_lbr_type_map[]={
> > >>> + [0] = PERF_BR_COND,
> > >>> + [1] = PERF_BR_IND,
> > >>> + [2] = PERF_BR_UNCOND,
> > >>> + [3] = PERF_BR_IND_CALL,
> > >>> + [4] = PERF_BR_CALL,
> > >>> + [5] = PERF_BR_RET,
> > >>> +};
> > >>> +#define ARCH_LBR_TYPE_COUNT ARRAY_SIZE(arch_lbr_type_map)
> > >>> +
> > >>> +static __always_inline u16 get_lbr_br_type(u64 info)
> > >>> {
> > >>> - int type = 0;
> > >>> + u16 type = 0;
> > >>>
> > >>> if (static_branch_likely(&x86_lbr_type))
> > >>> type = (info & LBR_INFO_BR_TYPE) >> LBR_INFO_BR_TYPE_OFFSET;
> > >>> @@ -904,6 +918,21 @@ static __always_inline int get_lbr_br_type(u64 info)
> > >>> return type;
> > >>> }
> > >>>
> > >>> +/*
> > >>> + * The kernel cannot expose raw Intel branch type encodings because they are
> > >>> + * not generic. Instead, the function below maps the encoding to the
> > >>> + * perf_events user visible branch types.
> > >>> + */
> > >>> +static __always_inline int get_lbr_br_type_mapping(u64 info)
> > >>> +{
> > >>> + if (static_branch_likely(&x86_lbr_type)) {
> > >>> + u16 raw_type = get_lbr_br_type(info);
> > >>> + if (raw_type < ARCH_LBR_TYPE_COUNT)
> > >>> + return arch_lbr_type_map[raw_type];
> > >>> + }
> > >>> + return PERF_BR_UNKNOWN;
> > >>> +}
> > >>> +
> > >>> static __always_inline bool get_lbr_mispred(u64 info)
> > >>> {
> > >>> bool mispred = 0;
> > >>> @@ -957,7 +986,7 @@ static void intel_pmu_store_lbr(struct cpu_hw_events *cpuc,
> > >>> e->in_tx = !!(info & LBR_INFO_IN_TX);
> > >>> e->abort = !!(info & LBR_INFO_ABORT);
> > >>> e->cycles = get_lbr_cycles(info);
> > >>> - e->type = get_lbr_br_type(info);
> > >>> + e->type = get_lbr_br_type_mapping(info);
> > >>> }
> > >>>
> > >>> cpuc->lbr_stack.nr = i;

2022-08-11 17:36:56

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH] perf/x86/intel/lbr: fix branch type encoding



On 2022-08-11 11:33 a.m., Stephane Eranian wrote:
> On Thu, Aug 11, 2022 at 6:28 PM Stephane Eranian <[email protected]> wrote:
>>
>> On Thu, Aug 11, 2022 at 5:42 PM Liang, Kan <[email protected]> wrote:
>>>
>>>
>>>
>>> On 2022-08-11 10:17 a.m., Stephane Eranian wrote:
>>>> On Thu, Aug 11, 2022 at 3:23 PM Liang, Kan <[email protected]> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 2022-08-10 5:06 p.m., Stephane Eranian wrote:
>>>>>> With architected LBR, the procesosr can record the type of each sampled taken
>>>>>> branch. The type is encoded in 4-bit field in the LBR_INFO MSR of each entry.
>>>>>>
>>>>>> The branch type must then extracted and saved in the perf_branch_entry in the
>>>>>> perf_events sampling buffer. With the current code, the raw Intel encoding of
>>>>>> the branch is exported to user tools.
>>>>>
>>>>> In the intel_pmu_lbr_filter(), the raw encoding will be converted into
>>>>> the X86_BR_* format via arch_lbr_br_type_map[]. Then the
>>>>> common_branch_type() will convert the X86_BR_* format to the generic
>>>>> PERF_BR_* type and expose to user tools.
>>>>>
>>>>> I double check the existing arch_lbr_br_type_map[] and branch_map[].
>>>>> They should generate the same PERF_BR_* type as your arch_lbr_type_map[].
>>>>>
>>>>> Is there a test case which I can use to reproduce the problem?
>>>>>
>>>> I was doing a simple:
>>>> $ perf record -b -e cpu/event=0xc4/ ....
>>>> $ perf report -D
>>>> Looking at the LBR information and the BR type, many entries has no branch type.
>>>> What I see is a function where you do: e->type = get_lbr_br_type() and
>>>> that is what
>>>> is then saved in the buffer. Unless I am missing a later patch.
>>>>
>>>
>>> To get the LBR type, the save_type filter option must be applied. See
>>> 60f83fa6341d ("perf record: Create a new option save_type in
>>> --branch-filter").
>>>
>> That seems overly complicated. I don't recall having to pass a new option
>> to get the LBR latency. It showed up automatically. So why for branch_type?
>>
>>> The -b only include the ANY option. Maybe we should extend the -b option
>>> to ANY|SAVE_TYPE.
>>>
>> Ok, that explains it then. I think we need to simplify.
>>
> In fact, I don't see a case where you would not benefit from the branch type.
> Furthermore, not having the branch type DOES NOT save any space in the
> branch record (given we have a reserved field). So I think I prefer not having
> to specify yet another cmdline option to get the branch type.


I think the option is to avoid the overhead of disassembling of branch
instruction. See eb0baf8a0d92 ("perf/core: Define the common branch type
classification")
"Since the disassembling of branch instruction needs some overhead,
a new PERF_SAMPLE_BRANCH_TYPE_SAVE is introduced to indicate if it
needs to disassemble the branch instruction and record the branch
type."

I have no idea how big the overhead is. If we can always be benefit from
the branch type. I guess we can make it default on.

> In fact, if you do
> not pass the option, then perf report -D reports some bogus branch types, i.e.,
> not all entries have empty types.

Yes, that's an issue for Arch LBR. If the overhead is not a problem, it
should be gone after we make the SAVE_TYPE default. Otherwise, I think
we need a patch to clear the fields if the SAVE_TYPE is not applied.

Thanks,
Kan
>
>
>>>>
>>>>> Thanks,
>>>>> Kan
>>>>>
>>>>>> Yet tools, such as perf, expected the
>>>>>> branch type to be encoded using perf_events branch type enum
>>>>>> (see tools/perf/util/branch.c). As a result of the discrepancy, the output of
>>>>>> perf report -D shows bogus branch types.
>>>>>>
>>>>>> Fix the problem by converting the Intel raw encoding into the perf_events
>>>>>> branch type enum values. With that in place and with no changes to the tools,
>>>>>> the branch types are now reported properly.
>>>>>>
>>>>>> Signed-off-by: Stephane Eranian <[email protected]>
>>>>>> ---
>>>>>> arch/x86/events/intel/lbr.c | 35 ++++++++++++++++++++++++++++++++---
>>>>>> 1 file changed, 32 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
>>>>>> index 4f70fb6c2c1e..ef63d4d46b50 100644
>>>>>> --- a/arch/x86/events/intel/lbr.c
>>>>>> +++ b/arch/x86/events/intel/lbr.c
>>>>>> @@ -894,9 +894,23 @@ static DEFINE_STATIC_KEY_FALSE(x86_lbr_mispred);
>>>>>> static DEFINE_STATIC_KEY_FALSE(x86_lbr_cycles);
>>>>>> static DEFINE_STATIC_KEY_FALSE(x86_lbr_type);
>>>>>>
>>>>>> -static __always_inline int get_lbr_br_type(u64 info)
>>>>>> +/*
>>>>>> + * Array index encodes IA32_LBR_x_INFO Branch Type Encodings
>>>>>> + * as per Intel SDM Vol3b Branch Types section
>>>>>> + */
>>>>>> +static const int arch_lbr_type_map[]={
>>>>>> + [0] = PERF_BR_COND,
>>>>>> + [1] = PERF_BR_IND,
>>>>>> + [2] = PERF_BR_UNCOND,
>>>>>> + [3] = PERF_BR_IND_CALL,
>>>>>> + [4] = PERF_BR_CALL,
>>>>>> + [5] = PERF_BR_RET,
>>>>>> +};
>>>>>> +#define ARCH_LBR_TYPE_COUNT ARRAY_SIZE(arch_lbr_type_map)
>>>>>> +
>>>>>> +static __always_inline u16 get_lbr_br_type(u64 info)
>>>>>> {
>>>>>> - int type = 0;
>>>>>> + u16 type = 0;
>>>>>>
>>>>>> if (static_branch_likely(&x86_lbr_type))
>>>>>> type = (info & LBR_INFO_BR_TYPE) >> LBR_INFO_BR_TYPE_OFFSET;
>>>>>> @@ -904,6 +918,21 @@ static __always_inline int get_lbr_br_type(u64 info)
>>>>>> return type;
>>>>>> }
>>>>>>
>>>>>> +/*
>>>>>> + * The kernel cannot expose raw Intel branch type encodings because they are
>>>>>> + * not generic. Instead, the function below maps the encoding to the
>>>>>> + * perf_events user visible branch types.
>>>>>> + */
>>>>>> +static __always_inline int get_lbr_br_type_mapping(u64 info)
>>>>>> +{
>>>>>> + if (static_branch_likely(&x86_lbr_type)) {
>>>>>> + u16 raw_type = get_lbr_br_type(info);
>>>>>> + if (raw_type < ARCH_LBR_TYPE_COUNT)
>>>>>> + return arch_lbr_type_map[raw_type];
>>>>>> + }
>>>>>> + return PERF_BR_UNKNOWN;
>>>>>> +}
>>>>>> +
>>>>>> static __always_inline bool get_lbr_mispred(u64 info)
>>>>>> {
>>>>>> bool mispred = 0;
>>>>>> @@ -957,7 +986,7 @@ static void intel_pmu_store_lbr(struct cpu_hw_events *cpuc,
>>>>>> e->in_tx = !!(info & LBR_INFO_IN_TX);
>>>>>> e->abort = !!(info & LBR_INFO_ABORT);
>>>>>> e->cycles = get_lbr_cycles(info);
>>>>>> - e->type = get_lbr_br_type(info);
>>>>>> + e->type = get_lbr_br_type_mapping(info);
>>>>>> }
>>>>>>
>>>>>> cpuc->lbr_stack.nr = i;

2022-08-11 20:25:17

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] perf/x86/intel/lbr: fix branch type encoding


On 8/11/2022 5:33 PM, Stephane Eranian wrote:
>
> In fact, I don't see a case where you would not benefit from the branch type.
> Furthermore, not having the branch type DOES NOT save any space in the
> branch record (given we have a reserved field). So I think I prefer not having
> to specify yet another cmdline option to get the branch type. In fact, if you do
> not pass the option, then perf report -D reports some bogus branch types, i.e.,
> not all entries have empty types.

Perhaps Peter remember the details, but there was some ABI compatibility
issue that motivated the extra save type bitmap.

Also there's another reason that if you don't need them and the hardware
doesn't support it they add run time overhead because the branches need
to be decoded in the PMI.

But I agree it would be useful to make them part of -b default, as long
as there is some way to turn it off again when not needed. The extra
option is kind of a pain.

-Andi

2022-08-12 08:31:59

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] perf/x86/intel/lbr: fix branch type encoding


>
> I think the option is to avoid the overhead of disassembling of branch
> instruction. See eb0baf8a0d92 ("perf/core: Define the common branch type
> classification")
> "Since the disassembling of branch instruction needs some overhead,
> a new PERF_SAMPLE_BRANCH_TYPE_SAVE is introduced to indicate if it
> needs to disassemble the branch instruction and record the branch
> type."


Thanks for digging it out. So it was only performance.

>
> I have no idea how big the overhead is. If we can always be benefit from
> the branch type. I guess we can make it default on.

I thought even arch LBR had one case where it had to disassemble, but
perhaps it's unlikely enough because it's pre filtered. If yes it may be
ok to enable it there unconditionally at the kernel level.

Still have to decide if we want older parts to have more overhead by
default. I guess would need some data on that.


-Andi

2022-08-12 19:43:46

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH] perf/x86/intel/lbr: fix branch type encoding

Em Thu, Aug 11, 2022 at 11:56:56AM -0400, Liang, Kan escreveu:
>
>
> On 2022-08-11 11:33 a.m., Stephane Eranian wrote:
> > On Thu, Aug 11, 2022 at 6:28 PM Stephane Eranian <[email protected]> wrote:
> >>
> >> On Thu, Aug 11, 2022 at 5:42 PM Liang, Kan <[email protected]> wrote:
> >>>
> >>>
> >>>
> >>> On 2022-08-11 10:17 a.m., Stephane Eranian wrote:
> >>>> On Thu, Aug 11, 2022 at 3:23 PM Liang, Kan <[email protected]> wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 2022-08-10 5:06 p.m., Stephane Eranian wrote:
> >>>>>> With architected LBR, the procesosr can record the type of each sampled taken
> >>>>>> branch. The type is encoded in 4-bit field in the LBR_INFO MSR of each entry.
> >>>>>>
> >>>>>> The branch type must then extracted and saved in the perf_branch_entry in the
> >>>>>> perf_events sampling buffer. With the current code, the raw Intel encoding of
> >>>>>> the branch is exported to user tools.
> >>>>>
> >>>>> In the intel_pmu_lbr_filter(), the raw encoding will be converted into
> >>>>> the X86_BR_* format via arch_lbr_br_type_map[]. Then the
> >>>>> common_branch_type() will convert the X86_BR_* format to the generic
> >>>>> PERF_BR_* type and expose to user tools.
> >>>>>
> >>>>> I double check the existing arch_lbr_br_type_map[] and branch_map[].
> >>>>> They should generate the same PERF_BR_* type as your arch_lbr_type_map[].
> >>>>>
> >>>>> Is there a test case which I can use to reproduce the problem?
> >>>>>
> >>>> I was doing a simple:
> >>>> $ perf record -b -e cpu/event=0xc4/ ....
> >>>> $ perf report -D
> >>>> Looking at the LBR information and the BR type, many entries has no branch type.
> >>>> What I see is a function where you do: e->type = get_lbr_br_type() and
> >>>> that is what
> >>>> is then saved in the buffer. Unless I am missing a later patch.
> >>>>
> >>>
> >>> To get the LBR type, the save_type filter option must be applied. See
> >>> 60f83fa6341d ("perf record: Create a new option save_type in
> >>> --branch-filter").
> >>>
> >> That seems overly complicated. I don't recall having to pass a new option
> >> to get the LBR latency. It showed up automatically. So why for branch_type?
> >>
> >>> The -b only include the ANY option. Maybe we should extend the -b option
> >>> to ANY|SAVE_TYPE.
> >>>
> >> Ok, that explains it then. I think we need to simplify.
> >>
> > In fact, I don't see a case where you would not benefit from the branch type.
> > Furthermore, not having the branch type DOES NOT save any space in the
> > branch record (given we have a reserved field). So I think I prefer not having
> > to specify yet another cmdline option to get the branch type.
>
>
> I think the option is to avoid the overhead of disassembling of branch
> instruction. See eb0baf8a0d92 ("perf/core: Define the common branch type
> classification")
> "Since the disassembling of branch instruction needs some overhead,
> a new PERF_SAMPLE_BRANCH_TYPE_SAVE is introduced to indicate if it
> needs to disassemble the branch instruction and record the branch
> type."
>
> I have no idea how big the overhead is. If we can always be benefit from
> the branch type. I guess we can make it default on.

Would you be so nice as to add a paragraph with such explanation to the
man page so that what happened to Stephane doesn't trip other people?

- Arnaldo

> > In fact, if you do
> > not pass the option, then perf report -D reports some bogus branch types, i.e.,
> > not all entries have empty types.
>
> Yes, that's an issue for Arch LBR. If the overhead is not a problem, it
> should be gone after we make the SAVE_TYPE default. Otherwise, I think
> we need a patch to clear the fields if the SAVE_TYPE is not applied.
>
> Thanks,
> Kan
> >
> >
> >>>>
> >>>>> Thanks,
> >>>>> Kan
> >>>>>
> >>>>>> Yet tools, such as perf, expected the
> >>>>>> branch type to be encoded using perf_events branch type enum
> >>>>>> (see tools/perf/util/branch.c). As a result of the discrepancy, the output of
> >>>>>> perf report -D shows bogus branch types.
> >>>>>>
> >>>>>> Fix the problem by converting the Intel raw encoding into the perf_events
> >>>>>> branch type enum values. With that in place and with no changes to the tools,
> >>>>>> the branch types are now reported properly.
> >>>>>>
> >>>>>> Signed-off-by: Stephane Eranian <[email protected]>
> >>>>>> ---
> >>>>>> arch/x86/events/intel/lbr.c | 35 ++++++++++++++++++++++++++++++++---
> >>>>>> 1 file changed, 32 insertions(+), 3 deletions(-)
> >>>>>>
> >>>>>> diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
> >>>>>> index 4f70fb6c2c1e..ef63d4d46b50 100644
> >>>>>> --- a/arch/x86/events/intel/lbr.c
> >>>>>> +++ b/arch/x86/events/intel/lbr.c
> >>>>>> @@ -894,9 +894,23 @@ static DEFINE_STATIC_KEY_FALSE(x86_lbr_mispred);
> >>>>>> static DEFINE_STATIC_KEY_FALSE(x86_lbr_cycles);
> >>>>>> static DEFINE_STATIC_KEY_FALSE(x86_lbr_type);
> >>>>>>
> >>>>>> -static __always_inline int get_lbr_br_type(u64 info)
> >>>>>> +/*
> >>>>>> + * Array index encodes IA32_LBR_x_INFO Branch Type Encodings
> >>>>>> + * as per Intel SDM Vol3b Branch Types section
> >>>>>> + */
> >>>>>> +static const int arch_lbr_type_map[]={
> >>>>>> + [0] = PERF_BR_COND,
> >>>>>> + [1] = PERF_BR_IND,
> >>>>>> + [2] = PERF_BR_UNCOND,
> >>>>>> + [3] = PERF_BR_IND_CALL,
> >>>>>> + [4] = PERF_BR_CALL,
> >>>>>> + [5] = PERF_BR_RET,
> >>>>>> +};
> >>>>>> +#define ARCH_LBR_TYPE_COUNT ARRAY_SIZE(arch_lbr_type_map)
> >>>>>> +
> >>>>>> +static __always_inline u16 get_lbr_br_type(u64 info)
> >>>>>> {
> >>>>>> - int type = 0;
> >>>>>> + u16 type = 0;
> >>>>>>
> >>>>>> if (static_branch_likely(&x86_lbr_type))
> >>>>>> type = (info & LBR_INFO_BR_TYPE) >> LBR_INFO_BR_TYPE_OFFSET;
> >>>>>> @@ -904,6 +918,21 @@ static __always_inline int get_lbr_br_type(u64 info)
> >>>>>> return type;
> >>>>>> }
> >>>>>>
> >>>>>> +/*
> >>>>>> + * The kernel cannot expose raw Intel branch type encodings because they are
> >>>>>> + * not generic. Instead, the function below maps the encoding to the
> >>>>>> + * perf_events user visible branch types.
> >>>>>> + */
> >>>>>> +static __always_inline int get_lbr_br_type_mapping(u64 info)
> >>>>>> +{
> >>>>>> + if (static_branch_likely(&x86_lbr_type)) {
> >>>>>> + u16 raw_type = get_lbr_br_type(info);
> >>>>>> + if (raw_type < ARCH_LBR_TYPE_COUNT)
> >>>>>> + return arch_lbr_type_map[raw_type];
> >>>>>> + }
> >>>>>> + return PERF_BR_UNKNOWN;
> >>>>>> +}
> >>>>>> +
> >>>>>> static __always_inline bool get_lbr_mispred(u64 info)
> >>>>>> {
> >>>>>> bool mispred = 0;
> >>>>>> @@ -957,7 +986,7 @@ static void intel_pmu_store_lbr(struct cpu_hw_events *cpuc,
> >>>>>> e->in_tx = !!(info & LBR_INFO_IN_TX);
> >>>>>> e->abort = !!(info & LBR_INFO_ABORT);
> >>>>>> e->cycles = get_lbr_cycles(info);
> >>>>>> - e->type = get_lbr_br_type(info);
> >>>>>> + e->type = get_lbr_br_type_mapping(info);
> >>>>>> }
> >>>>>>
> >>>>>> cpuc->lbr_stack.nr = i;

--

- Arnaldo

2022-08-14 19:56:38

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH] perf/x86/intel/lbr: fix branch type encoding



On 2022-08-12 4:16 a.m., Andi Kleen wrote:
>
>>
>> I think the option is to avoid the overhead of disassembling of branch
>> instruction. See eb0baf8a0d92 ("perf/core: Define the common branch type
>> classification")
>> "Since the disassembling of branch instruction needs some overhead,
>> a new PERF_SAMPLE_BRANCH_TYPE_SAVE is introduced to indicate if it
>> needs to disassemble the branch instruction and record the branch
>> type."
>
>
> Thanks for digging it out. So it was only performance.
>
>>
>> I have no idea how big the overhead is. If we can always be benefit from
>> the branch type. I guess we can make it default on.
>
> I thought even arch LBR had one case where it had to disassemble, but
> perhaps it's unlikely enough because it's pre filtered. If yes it may be
> ok to enable it there unconditionally at the kernel level.
>

Yes, Arch LBR should have much less overhead than the previous
platforms. The most common branches, JCC and near JMP/CALL, are from the
HW. Only the other branches, e.g., far call, SYS* etc, which still rely
on the SW disassemble. The number of the other branches should not be
big. I agree that we should enable the branch type for the Arch LBR
unconditionally at the kernel level.

Peter? Stephane? What do you think?

> Still have to decide if we want older parts to have more overhead by
> default. I guess would need some data on that.


The previous LBR already has high overhead. The branch type overhead
will make it worse. I think it's better keep it default off. I think we
can make it clear in the document that the branch type is only default
on for the new platforms with Arch LBR support (12th-Gen+ client or
4th-Gen Xeon+ server).

Thanks,
Kan

2022-08-14 19:56:38

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH] perf/x86/intel/lbr: fix branch type encoding



On 2022-08-12 3:35 p.m., Arnaldo Carvalho de Melo wrote:
> Em Thu, Aug 11, 2022 at 11:56:56AM -0400, Liang, Kan escreveu:
>>
>>
>> On 2022-08-11 11:33 a.m., Stephane Eranian wrote:
>>> On Thu, Aug 11, 2022 at 6:28 PM Stephane Eranian <[email protected]> wrote:
>>>>
>>>> On Thu, Aug 11, 2022 at 5:42 PM Liang, Kan <[email protected]> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 2022-08-11 10:17 a.m., Stephane Eranian wrote:
>>>>>> On Thu, Aug 11, 2022 at 3:23 PM Liang, Kan <[email protected]> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 2022-08-10 5:06 p.m., Stephane Eranian wrote:
>>>>>>>> With architected LBR, the procesosr can record the type of each sampled taken
>>>>>>>> branch. The type is encoded in 4-bit field in the LBR_INFO MSR of each entry.
>>>>>>>>
>>>>>>>> The branch type must then extracted and saved in the perf_branch_entry in the
>>>>>>>> perf_events sampling buffer. With the current code, the raw Intel encoding of
>>>>>>>> the branch is exported to user tools.
>>>>>>>
>>>>>>> In the intel_pmu_lbr_filter(), the raw encoding will be converted into
>>>>>>> the X86_BR_* format via arch_lbr_br_type_map[]. Then the
>>>>>>> common_branch_type() will convert the X86_BR_* format to the generic
>>>>>>> PERF_BR_* type and expose to user tools.
>>>>>>>
>>>>>>> I double check the existing arch_lbr_br_type_map[] and branch_map[].
>>>>>>> They should generate the same PERF_BR_* type as your arch_lbr_type_map[].
>>>>>>>
>>>>>>> Is there a test case which I can use to reproduce the problem?
>>>>>>>
>>>>>> I was doing a simple:
>>>>>> $ perf record -b -e cpu/event=0xc4/ ....
>>>>>> $ perf report -D
>>>>>> Looking at the LBR information and the BR type, many entries has no branch type.
>>>>>> What I see is a function where you do: e->type = get_lbr_br_type() and
>>>>>> that is what
>>>>>> is then saved in the buffer. Unless I am missing a later patch.
>>>>>>
>>>>>
>>>>> To get the LBR type, the save_type filter option must be applied. See
>>>>> 60f83fa6341d ("perf record: Create a new option save_type in
>>>>> --branch-filter").
>>>>>
>>>> That seems overly complicated. I don't recall having to pass a new option
>>>> to get the LBR latency. It showed up automatically. So why for branch_type?
>>>>
>>>>> The -b only include the ANY option. Maybe we should extend the -b option
>>>>> to ANY|SAVE_TYPE.
>>>>>
>>>> Ok, that explains it then. I think we need to simplify.
>>>>
>>> In fact, I don't see a case where you would not benefit from the branch type.
>>> Furthermore, not having the branch type DOES NOT save any space in the
>>> branch record (given we have a reserved field). So I think I prefer not having
>>> to specify yet another cmdline option to get the branch type.
>>
>>
>> I think the option is to avoid the overhead of disassembling of branch
>> instruction. See eb0baf8a0d92 ("perf/core: Define the common branch type
>> classification")
>> "Since the disassembling of branch instruction needs some overhead,
>> a new PERF_SAMPLE_BRANCH_TYPE_SAVE is introduced to indicate if it
>> needs to disassemble the branch instruction and record the branch
>> type."
>>
>> I have no idea how big the overhead is. If we can always be benefit from
>> the branch type. I guess we can make it default on.
>
> Would you be so nice as to add a paragraph with such explanation to the
> man page so that what happened to Stephane doesn't trip other people?
>

Sure, I will update the document once we decide whether enable the
branch type by default in the kernel for Arch LBR.

Thanks,
Kan
> - Arnaldo
>
>>> In fact, if you do
>>> not pass the option, then perf report -D reports some bogus branch types, i.e.,
>>> not all entries have empty types.
>>
>> Yes, that's an issue for Arch LBR. If the overhead is not a problem, it
>> should be gone after we make the SAVE_TYPE default. Otherwise, I think
>> we need a patch to clear the fields if the SAVE_TYPE is not applied.
>>
>> Thanks,
>> Kan
>>>
>>>
>>>>>>
>>>>>>> Thanks,
>>>>>>> Kan
>>>>>>>
>>>>>>>> Yet tools, such as perf, expected the
>>>>>>>> branch type to be encoded using perf_events branch type enum
>>>>>>>> (see tools/perf/util/branch.c). As a result of the discrepancy, the output of
>>>>>>>> perf report -D shows bogus branch types.
>>>>>>>>
>>>>>>>> Fix the problem by converting the Intel raw encoding into the perf_events
>>>>>>>> branch type enum values. With that in place and with no changes to the tools,
>>>>>>>> the branch types are now reported properly.
>>>>>>>>
>>>>>>>> Signed-off-by: Stephane Eranian <[email protected]>
>>>>>>>> ---
>>>>>>>> arch/x86/events/intel/lbr.c | 35 ++++++++++++++++++++++++++++++++---
>>>>>>>> 1 file changed, 32 insertions(+), 3 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
>>>>>>>> index 4f70fb6c2c1e..ef63d4d46b50 100644
>>>>>>>> --- a/arch/x86/events/intel/lbr.c
>>>>>>>> +++ b/arch/x86/events/intel/lbr.c
>>>>>>>> @@ -894,9 +894,23 @@ static DEFINE_STATIC_KEY_FALSE(x86_lbr_mispred);
>>>>>>>> static DEFINE_STATIC_KEY_FALSE(x86_lbr_cycles);
>>>>>>>> static DEFINE_STATIC_KEY_FALSE(x86_lbr_type);
>>>>>>>>
>>>>>>>> -static __always_inline int get_lbr_br_type(u64 info)
>>>>>>>> +/*
>>>>>>>> + * Array index encodes IA32_LBR_x_INFO Branch Type Encodings
>>>>>>>> + * as per Intel SDM Vol3b Branch Types section
>>>>>>>> + */
>>>>>>>> +static const int arch_lbr_type_map[]={
>>>>>>>> + [0] = PERF_BR_COND,
>>>>>>>> + [1] = PERF_BR_IND,
>>>>>>>> + [2] = PERF_BR_UNCOND,
>>>>>>>> + [3] = PERF_BR_IND_CALL,
>>>>>>>> + [4] = PERF_BR_CALL,
>>>>>>>> + [5] = PERF_BR_RET,
>>>>>>>> +};
>>>>>>>> +#define ARCH_LBR_TYPE_COUNT ARRAY_SIZE(arch_lbr_type_map)
>>>>>>>> +
>>>>>>>> +static __always_inline u16 get_lbr_br_type(u64 info)
>>>>>>>> {
>>>>>>>> - int type = 0;
>>>>>>>> + u16 type = 0;
>>>>>>>>
>>>>>>>> if (static_branch_likely(&x86_lbr_type))
>>>>>>>> type = (info & LBR_INFO_BR_TYPE) >> LBR_INFO_BR_TYPE_OFFSET;
>>>>>>>> @@ -904,6 +918,21 @@ static __always_inline int get_lbr_br_type(u64 info)
>>>>>>>> return type;
>>>>>>>> }
>>>>>>>>
>>>>>>>> +/*
>>>>>>>> + * The kernel cannot expose raw Intel branch type encodings because they are
>>>>>>>> + * not generic. Instead, the function below maps the encoding to the
>>>>>>>> + * perf_events user visible branch types.
>>>>>>>> + */
>>>>>>>> +static __always_inline int get_lbr_br_type_mapping(u64 info)
>>>>>>>> +{
>>>>>>>> + if (static_branch_likely(&x86_lbr_type)) {
>>>>>>>> + u16 raw_type = get_lbr_br_type(info);
>>>>>>>> + if (raw_type < ARCH_LBR_TYPE_COUNT)
>>>>>>>> + return arch_lbr_type_map[raw_type];
>>>>>>>> + }
>>>>>>>> + return PERF_BR_UNKNOWN;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> static __always_inline bool get_lbr_mispred(u64 info)
>>>>>>>> {
>>>>>>>> bool mispred = 0;
>>>>>>>> @@ -957,7 +986,7 @@ static void intel_pmu_store_lbr(struct cpu_hw_events *cpuc,
>>>>>>>> e->in_tx = !!(info & LBR_INFO_IN_TX);
>>>>>>>> e->abort = !!(info & LBR_INFO_ABORT);
>>>>>>>> e->cycles = get_lbr_cycles(info);
>>>>>>>> - e->type = get_lbr_br_type(info);
>>>>>>>> + e->type = get_lbr_br_type_mapping(info);
>>>>>>>> }
>>>>>>>>
>>>>>>>> cpuc->lbr_stack.nr = i;
>

2022-08-16 00:49:11

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH] perf/x86/intel/lbr: fix branch type encoding

On Sun, Aug 14, 2022 at 12:37 PM Liang, Kan <[email protected]> wrote:
>
>
>
> On 2022-08-12 4:16 a.m., Andi Kleen wrote:
> >
> >>
> >> I think the option is to avoid the overhead of disassembling of branch
> >> instruction. See eb0baf8a0d92 ("perf/core: Define the common branch type
> >> classification")
> >> "Since the disassembling of branch instruction needs some overhead,
> >> a new PERF_SAMPLE_BRANCH_TYPE_SAVE is introduced to indicate if it
> >> needs to disassemble the branch instruction and record the branch
> >> type."
> >
> >
> > Thanks for digging it out. So it was only performance.
> >
> >>
> >> I have no idea how big the overhead is. If we can always be benefit from
> >> the branch type. I guess we can make it default on.
> >
> > I thought even arch LBR had one case where it had to disassemble, but
> > perhaps it's unlikely enough because it's pre filtered. If yes it may be
> > ok to enable it there unconditionally at the kernel level.
> >
>
> Yes, Arch LBR should have much less overhead than the previous
> platforms. The most common branches, JCC and near JMP/CALL, are from the
> HW. Only the other branches, e.g., far call, SYS* etc, which still rely
> on the SW disassemble. The number of the other branches should not be
> big. I agree that we should enable the branch type for the Arch LBR
> unconditionally at the kernel level.
>
> Peter? Stephane? What do you think?
>
> > Still have to decide if we want older parts to have more overhead by
> > default. I guess would need some data on that.
>
I don't think you want that. It is okay to have it when it is free. Otherwise it
is best if it remains opt-in.
>
> The previous LBR already has high overhead. The branch type overhead
> will make it worse. I think it's better keep it default off. I think we
> can make it clear in the document that the branch type is only default
> on for the new platforms with Arch LBR support (12th-Gen+ client or
> 4th-Gen Xeon+ server).
>
I am okay with that.

2022-08-16 00:55:39

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH] perf/x86/intel/lbr: fix branch type encoding



On 2022-08-15 3:45 p.m., Stephane Eranian wrote:
> On Sun, Aug 14, 2022 at 12:37 PM Liang, Kan <[email protected]> wrote:
>>
>>
>>
>> On 2022-08-12 4:16 a.m., Andi Kleen wrote:
>>>
>>>>
>>>> I think the option is to avoid the overhead of disassembling of branch
>>>> instruction. See eb0baf8a0d92 ("perf/core: Define the common branch type
>>>> classification")
>>>> "Since the disassembling of branch instruction needs some overhead,
>>>> a new PERF_SAMPLE_BRANCH_TYPE_SAVE is introduced to indicate if it
>>>> needs to disassemble the branch instruction and record the branch
>>>> type."
>>>
>>>
>>> Thanks for digging it out. So it was only performance.
>>>
>>>>
>>>> I have no idea how big the overhead is. If we can always be benefit from
>>>> the branch type. I guess we can make it default on.
>>>
>>> I thought even arch LBR had one case where it had to disassemble, but
>>> perhaps it's unlikely enough because it's pre filtered. If yes it may be
>>> ok to enable it there unconditionally at the kernel level.
>>>
>>
>> Yes, Arch LBR should have much less overhead than the previous
>> platforms. The most common branches, JCC and near JMP/CALL, are from the
>> HW. Only the other branches, e.g., far call, SYS* etc, which still rely
>> on the SW disassemble. The number of the other branches should not be
>> big. I agree that we should enable the branch type for the Arch LBR
>> unconditionally at the kernel level.
>>
>> Peter? Stephane? What do you think?
>>
>>> Still have to decide if we want older parts to have more overhead by
>>> default. I guess would need some data on that.
>>
> I don't think you want that. It is okay to have it when it is free. Otherwise it
> is best if it remains opt-in.>>
>> The previous LBR already has high overhead. The branch type overhead
>> will make it worse. I think it's better keep it default off. I think we
>> can make it clear in the document that the branch type is only default
>> on for the new platforms with Arch LBR support (12th-Gen+ client or
>> 4th-Gen Xeon+ server).
>>
> I am okay with that.


I see.

I will post a kernel patch to enable the branch type for Arch LBR by
default, and also update the perf tool document as Arnaldo suggested.

Thanks,
Kan