2022-11-12 16:05:34

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH] perf/x86/intel/pt: Fix sampling using single range output

Deal with errata TGL052, ADL037 and RPL017 "Trace May Contain Incorrect
Data When Configured With Single Range Output Larger Than 4KB" by
disabling single range output whenever larger than 4KB.

Fixes: 670638477aed ("perf/x86/intel/pt: Opportunistically use single range output mode")
Cc: [email protected]
Signed-off-by: Adrian Hunter <[email protected]>
---
arch/x86/events/intel/pt.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index 82ef87e9a897..42a55794004a 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -1263,6 +1263,15 @@ static int pt_buffer_try_single(struct pt_buffer *buf, int nr_pages)
if (1 << order != nr_pages)
goto out;

+ /*
+ * Some processors cannot always support single range for more than
+ * 4KB - refer errata TGL052, ADL037 and RPL017. Future processors might
+ * also be affected, so for now rather than trying to keep track of
+ * which ones, just disable it for all.
+ */
+ if (nr_pages > 1)
+ goto out;
+
buf->single = true;
buf->nr_pages = nr_pages;
ret = 0;
--
2.34.1



2022-11-14 11:20:34

by Adrian Hunter

[permalink] [raw]
Subject: Re: [PATCH] perf/x86/intel/pt: Fix sampling using single range output

On 14/11/22 12:51, Peter Zijlstra wrote:
> On Sat, Nov 12, 2022 at 05:15:08PM +0200, Adrian Hunter wrote:
>> Deal with errata TGL052, ADL037 and RPL017 "Trace May Contain Incorrect
>> Data When Configured With Single Range Output Larger Than 4KB" by
>> disabling single range output whenever larger than 4KB.
>>
>> Fixes: 670638477aed ("perf/x86/intel/pt: Opportunistically use single range output mode")
>> Cc: [email protected]
>> Signed-off-by: Adrian Hunter <[email protected]>
>> ---
>> arch/x86/events/intel/pt.c | 9 +++++++++
>> 1 file changed, 9 insertions(+)
>>
>> diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
>> index 82ef87e9a897..42a55794004a 100644
>> --- a/arch/x86/events/intel/pt.c
>> +++ b/arch/x86/events/intel/pt.c
>> @@ -1263,6 +1263,15 @@ static int pt_buffer_try_single(struct pt_buffer *buf, int nr_pages)
>> if (1 << order != nr_pages)
>> goto out;
>>
>> + /*
>> + * Some processors cannot always support single range for more than
>> + * 4KB - refer errata TGL052, ADL037 and RPL017. Future processors might
>> + * also be affected, so for now rather than trying to keep track of
>> + * which ones, just disable it for all.
>> + */
>> + if (nr_pages > 1)
>> + goto out;
>
> This effectively declares single-output-mode dead? Because I don't think
> anybody uses PT with a single 4K buffer.

4K is the default size for "sample mode" i.e. stuffing 4KB of Intel PT trace
data into a PERF_RECORD_SAMPLE record that has sample_type bit PERF_SAMPLE_AUX

e.g.

$ perf record -vv --aux-sample -e '{intel_pt//u,cycles:u}' uname 2>err.txt
Linux
$ grep aux_sample_size err.txt
aux_sample_size 4096
$


2022-11-14 11:30:12

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] perf/x86/intel/pt: Fix sampling using single range output

On Sat, Nov 12, 2022 at 05:15:08PM +0200, Adrian Hunter wrote:
> Deal with errata TGL052, ADL037 and RPL017 "Trace May Contain Incorrect
> Data When Configured With Single Range Output Larger Than 4KB" by
> disabling single range output whenever larger than 4KB.
>
> Fixes: 670638477aed ("perf/x86/intel/pt: Opportunistically use single range output mode")
> Cc: [email protected]
> Signed-off-by: Adrian Hunter <[email protected]>
> ---
> arch/x86/events/intel/pt.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
> index 82ef87e9a897..42a55794004a 100644
> --- a/arch/x86/events/intel/pt.c
> +++ b/arch/x86/events/intel/pt.c
> @@ -1263,6 +1263,15 @@ static int pt_buffer_try_single(struct pt_buffer *buf, int nr_pages)
> if (1 << order != nr_pages)
> goto out;
>
> + /*
> + * Some processors cannot always support single range for more than
> + * 4KB - refer errata TGL052, ADL037 and RPL017. Future processors might
> + * also be affected, so for now rather than trying to keep track of
> + * which ones, just disable it for all.
> + */
> + if (nr_pages > 1)
> + goto out;

This effectively declares single-output-mode dead? Because I don't think
anybody uses PT with a single 4K buffer.

2022-11-14 17:13:07

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] perf/x86/intel/pt: Fix sampling using single range output

On Mon, Nov 14, 2022 at 01:10:38PM +0200, Adrian Hunter wrote:
> On 14/11/22 12:51, Peter Zijlstra wrote:
> > On Sat, Nov 12, 2022 at 05:15:08PM +0200, Adrian Hunter wrote:
> >> Deal with errata TGL052, ADL037 and RPL017 "Trace May Contain Incorrect
> >> Data When Configured With Single Range Output Larger Than 4KB" by
> >> disabling single range output whenever larger than 4KB.
> >>
> >> Fixes: 670638477aed ("perf/x86/intel/pt: Opportunistically use single range output mode")
> >> Cc: [email protected]
> >> Signed-off-by: Adrian Hunter <[email protected]>
> >> ---
> >> arch/x86/events/intel/pt.c | 9 +++++++++
> >> 1 file changed, 9 insertions(+)
> >>
> >> diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
> >> index 82ef87e9a897..42a55794004a 100644
> >> --- a/arch/x86/events/intel/pt.c
> >> +++ b/arch/x86/events/intel/pt.c
> >> @@ -1263,6 +1263,15 @@ static int pt_buffer_try_single(struct pt_buffer *buf, int nr_pages)
> >> if (1 << order != nr_pages)
> >> goto out;
> >>
> >> + /*
> >> + * Some processors cannot always support single range for more than
> >> + * 4KB - refer errata TGL052, ADL037 and RPL017. Future processors might
> >> + * also be affected, so for now rather than trying to keep track of
> >> + * which ones, just disable it for all.
> >> + */
> >> + if (nr_pages > 1)
> >> + goto out;
> >
> > This effectively declares single-output-mode dead? Because I don't think
> > anybody uses PT with a single 4K buffer.
>
> 4K is the default size for "sample mode" i.e. stuffing 4KB of Intel PT trace
> data into a PERF_RECORD_SAMPLE record that has sample_type bit PERF_SAMPLE_AUX
>
> e.g.
>
> $ perf record -vv --aux-sample -e '{intel_pt//u,cycles:u}' uname 2>err.txt
> Linux
> $ grep aux_sample_size err.txt
> aux_sample_size 4096

Ah, ok. Not as bad then. Anyway, I'll go queue it for perf/urgent I
suppose.

2022-11-15 20:21:53

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] perf/x86/intel/pt: Fix sampling using single range output

Peter Zijlstra <[email protected]> writes:

> On Mon, Nov 14, 2022 at 01:10:38PM +0200, Adrian Hunter wrote:
>> On 14/11/22 12:51, Peter Zijlstra wrote:
>> > On Sat, Nov 12, 2022 at 05:15:08PM +0200, Adrian Hunter wrote:
>> >> Deal with errata TGL052, ADL037 and RPL017 "Trace May Contain Incorrect
>> >> Data When Configured With Single Range Output Larger Than 4KB" by
>> >> disabling single range output whenever larger than 4KB.
>> >>
>> >> Fixes: 670638477aed ("perf/x86/intel/pt: Opportunistically use single range output mode")
>> >> Cc: [email protected]
>> >> Signed-off-by: Adrian Hunter <[email protected]>
>> >> ---
>> >> arch/x86/events/intel/pt.c | 9 +++++++++
>> >> 1 file changed, 9 insertions(+)
>> >>
>> >> diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
>> >> index 82ef87e9a897..42a55794004a 100644
>> >> --- a/arch/x86/events/intel/pt.c
>> >> +++ b/arch/x86/events/intel/pt.c
>> >> @@ -1263,6 +1263,15 @@ static int pt_buffer_try_single(struct pt_buffer *buf, int nr_pages)
>> >> if (1 << order != nr_pages)
>> >> goto out;
>> >>
>> >> + /*
>> >> + * Some processors cannot always support single range for more than
>> >> + * 4KB - refer errata TGL052, ADL037 and RPL017. Future processors might
>> >> + * also be affected, so for now rather than trying to keep track of
>> >> + * which ones, just disable it for all.
>> >> + */
>> >> + if (nr_pages > 1)
>> >> + goto out;
>> >
>> > This effectively declares single-output-mode dead? Because I don't think
>> > anybody uses PT with a single 4K buffer.
>>
>> 4K is the default size for "sample mode" i.e. stuffing 4KB of Intel PT trace
>> data into a PERF_RECORD_SAMPLE record that has sample_type bit PERF_SAMPLE_AUX
>>
>> e.g.
>>
>> $ perf record -vv --aux-sample -e '{intel_pt//u,cycles:u}' uname 2>err.txt
>> Linux
>> $ grep aux_sample_size err.txt
>> aux_sample_size 4096
>
> Ah, ok. Not as bad then. Anyway, I'll go queue it for perf/urgent I
> suppose.

It would be better to only limit on the CPUs with the bug because
switching buffers causes some extra latencies. So this patch may regress
PT overhead or tail latencies.

-Andi

2022-11-16 06:59:19

by Adrian Hunter

[permalink] [raw]
Subject: Re: [PATCH] perf/x86/intel/pt: Fix sampling using single range output

On 15/11/22 21:46, Andi Kleen wrote:
> Peter Zijlstra <[email protected]> writes:
>
>> On Mon, Nov 14, 2022 at 01:10:38PM +0200, Adrian Hunter wrote:
>>> On 14/11/22 12:51, Peter Zijlstra wrote:
>>>> On Sat, Nov 12, 2022 at 05:15:08PM +0200, Adrian Hunter wrote:
>>>>> Deal with errata TGL052, ADL037 and RPL017 "Trace May Contain Incorrect
>>>>> Data When Configured With Single Range Output Larger Than 4KB" by
>>>>> disabling single range output whenever larger than 4KB.
>>>>>
>>>>> Fixes: 670638477aed ("perf/x86/intel/pt: Opportunistically use single range output mode")
>>>>> Cc: [email protected]
>>>>> Signed-off-by: Adrian Hunter <[email protected]>
>>>>> ---
>>>>> arch/x86/events/intel/pt.c | 9 +++++++++
>>>>> 1 file changed, 9 insertions(+)
>>>>>
>>>>> diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
>>>>> index 82ef87e9a897..42a55794004a 100644
>>>>> --- a/arch/x86/events/intel/pt.c
>>>>> +++ b/arch/x86/events/intel/pt.c
>>>>> @@ -1263,6 +1263,15 @@ static int pt_buffer_try_single(struct pt_buffer *buf, int nr_pages)
>>>>> if (1 << order != nr_pages)
>>>>> goto out;
>>>>>
>>>>> + /*
>>>>> + * Some processors cannot always support single range for more than
>>>>> + * 4KB - refer errata TGL052, ADL037 and RPL017. Future processors might
>>>>> + * also be affected, so for now rather than trying to keep track of
>>>>> + * which ones, just disable it for all.
>>>>> + */
>>>>> + if (nr_pages > 1)
>>>>> + goto out;
>>>>
>>>> This effectively declares single-output-mode dead? Because I don't think
>>>> anybody uses PT with a single 4K buffer.
>>>
>>> 4K is the default size for "sample mode" i.e. stuffing 4KB of Intel PT trace
>>> data into a PERF_RECORD_SAMPLE record that has sample_type bit PERF_SAMPLE_AUX
>>>
>>> e.g.
>>>
>>> $ perf record -vv --aux-sample -e '{intel_pt//u,cycles:u}' uname 2>err.txt
>>> Linux
>>> $ grep aux_sample_size err.txt
>>> aux_sample_size 4096
>>
>> Ah, ok. Not as bad then. Anyway, I'll go queue it for perf/urgent I
>> suppose.
>
> It would be better to only limit on the CPUs with the bug because
> switching buffers causes some extra latencies. So this patch may regress
> PT overhead or tail latencies.

I could whitelist CPUs that do not have the issue, because a blacklist
would keep expanding, which would be a bit of a pain to maintain.


2022-11-16 10:00:29

by tip-bot2 for Jacob Pan

[permalink] [raw]
Subject: [tip: perf/urgent] perf/x86/intel/pt: Fix sampling using single range output

The following commit has been merged into the perf/urgent branch of tip:

Commit-ID: ce0d998be9274dd3a3d971cbeaa6fe28fd2c3062
Gitweb: https://git.kernel.org/tip/ce0d998be9274dd3a3d971cbeaa6fe28fd2c3062
Author: Adrian Hunter <[email protected]>
AuthorDate: Sat, 12 Nov 2022 17:15:08 +02:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Wed, 16 Nov 2022 10:12:59 +01:00

perf/x86/intel/pt: Fix sampling using single range output

Deal with errata TGL052, ADL037 and RPL017 "Trace May Contain Incorrect
Data When Configured With Single Range Output Larger Than 4KB" by
disabling single range output whenever larger than 4KB.

Fixes: 670638477aed ("perf/x86/intel/pt: Opportunistically use single range output mode")
Signed-off-by: Adrian Hunter <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
---
arch/x86/events/intel/pt.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index 82ef87e..42a5579 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -1263,6 +1263,15 @@ static int pt_buffer_try_single(struct pt_buffer *buf, int nr_pages)
if (1 << order != nr_pages)
goto out;

+ /*
+ * Some processors cannot always support single range for more than
+ * 4KB - refer errata TGL052, ADL037 and RPL017. Future processors might
+ * also be affected, so for now rather than trying to keep track of
+ * which ones, just disable it for all.
+ */
+ if (nr_pages > 1)
+ goto out;
+
buf->single = true;
buf->nr_pages = nr_pages;
ret = 0;