2024-04-12 17:31:11

by Joseph, Jithu

[permalink] [raw]
Subject: [PATCH 0/3] Miscelleanous In Field Scan changes

Hi Hans/Ilpo,

Bunch of In Field Scan patches

Patch1 - Classify Scan controller malfunction as untested
Patch2 - Trace output format related
Patch3 - Disable irq during a particular load step

Jithu Joseph (3):
platform/x86/intel/ifs: Classify error scenarios correctly
platform/x86/intel/ifs: trace: display batch num in hex
platform/x86/intel/ifs: Disable irq during one load stage

include/trace/events/intel_ifs.h | 2 +-
drivers/platform/x86/intel/ifs/load.c | 2 ++
drivers/platform/x86/intel/ifs/runtest.c | 27 +++++++++++++-----------
3 files changed, 18 insertions(+), 13 deletions(-)


base-commit: fec50db7033ea478773b159e0e2efb135270e3b7
--
2.25.1



2024-04-12 17:31:29

by Joseph, Jithu

[permalink] [raw]
Subject: [PATCH 2/3] platform/x86/intel/ifs: trace: display batch num in hex

In Field Scan test image files are named in ff-mm-ss-<batch02x>.scan
format. Current trace output, prints the batch number in decimal format.

Make it easier to correlate the trace line to a test image file by
showing the batch number also in hex format.

Add 0x prefix to all fields in the trace line to make the type explicit.

Signed-off-by: Jithu Joseph <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Reviewed-by: Ashok Raj <[email protected]>
---
include/trace/events/intel_ifs.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/trace/events/intel_ifs.h b/include/trace/events/intel_ifs.h
index 8ce2de120f2d..0d88ebf2c980 100644
--- a/include/trace/events/intel_ifs.h
+++ b/include/trace/events/intel_ifs.h
@@ -28,7 +28,7 @@ TRACE_EVENT(ifs_status,
__entry->status = status;
),

- TP_printk("batch: %.2d, start: %.4x, stop: %.4x, status: %.16llx",
+ TP_printk("batch: 0x%.2x, start: 0x%.4x, stop: 0x%.4x, status: 0x%.16llx",
__entry->batch,
__entry->start,
__entry->stop,
--
2.25.1


2024-04-12 17:31:30

by Joseph, Jithu

[permalink] [raw]
Subject: [PATCH 1/3] platform/x86/intel/ifs: Classify error scenarios correctly

Based on inputs from hardware architects, only "scan signature failures"
should be treated as actual hardware/cpu failure.

Current driver, in addition, classifies "scan controller error" scenario
too as a hardware/cpu failure. Modify the driver to classify this situation
with a more appropriate "untested" status instead of "fail" status.

Signed-off-by: Jithu Joseph <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Reviewed-by: Ashok Raj <[email protected]>
---
drivers/platform/x86/intel/ifs/runtest.c | 27 +++++++++++++-----------
1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/drivers/platform/x86/intel/ifs/runtest.c b/drivers/platform/x86/intel/ifs/runtest.c
index 95b4b71fab53..282e4bfe30da 100644
--- a/drivers/platform/x86/intel/ifs/runtest.c
+++ b/drivers/platform/x86/intel/ifs/runtest.c
@@ -69,6 +69,19 @@ static const char * const scan_test_status[] = {

static void message_not_tested(struct device *dev, int cpu, union ifs_status status)
{
+ struct ifs_data *ifsd = ifs_get_data(dev);
+
+ /*
+ * control_error is set when the microcode runs into a problem
+ * loading the image from the reserved BIOS memory, or it has
+ * been corrupted. Reloading the image may fix this issue.
+ */
+ if (status.control_error) {
+ dev_warn(dev, "CPU(s) %*pbl: Scan controller error. Batch: %02x version: 0x%x\n",
+ cpumask_pr_args(cpu_smt_mask(cpu)), ifsd->cur_batch, ifsd->loaded_version);
+ return;
+ }
+
if (status.error_code < ARRAY_SIZE(scan_test_status)) {
dev_info(dev, "CPU(s) %*pbl: SCAN operation did not start. %s\n",
cpumask_pr_args(cpu_smt_mask(cpu)),
@@ -90,16 +103,6 @@ static void message_fail(struct device *dev, int cpu, union ifs_status status)
{
struct ifs_data *ifsd = ifs_get_data(dev);

- /*
- * control_error is set when the microcode runs into a problem
- * loading the image from the reserved BIOS memory, or it has
- * been corrupted. Reloading the image may fix this issue.
- */
- if (status.control_error) {
- dev_err(dev, "CPU(s) %*pbl: could not execute from loaded scan image. Batch: %02x version: 0x%x\n",
- cpumask_pr_args(cpu_smt_mask(cpu)), ifsd->cur_batch, ifsd->loaded_version);
- }
-
/*
* signature_error is set when the output from the scan chains does not
* match the expected signature. This might be a transient problem (e.g.
@@ -285,10 +288,10 @@ static void ifs_test_core(int cpu, struct device *dev)
/* Update status for this core */
ifsd->scan_details = status.data;

- if (status.control_error || status.signature_error) {
+ if (status.signature_error) {
ifsd->status = SCAN_TEST_FAIL;
message_fail(dev, cpu, status);
- } else if (status.error_code) {
+ } else if (status.control_error || status.error_code) {
ifsd->status = SCAN_NOT_TESTED;
message_not_tested(dev, cpu, status);
} else {
--
2.25.1


2024-04-12 17:31:50

by Joseph, Jithu

[permalink] [raw]
Subject: [PATCH 3/3] platform/x86/intel/ifs: Disable irq during one load stage

One of the stages in IFS image loading process involves loading individual
chunks (test patterns) from test image file to secure memory.

Driver issues a WRMSR(MSR_AUTHENTICATE_AND_COPY_CHUNK) operation to do
this. This operation can take up to 5 msec, and if an interrupt occurs
in between, the AUTH_AND_COPY_CHUNK u-code implementation aborts the
operation.

Interrupt sources such as NMI or SMI are handled by retrying. Regular
interrupts may occur frequently enough to prevent this operation from ever
completing. Disable irq on local cpu around the aforementioned WRMSR to
allow the operation to complete.

Signed-off-by: Jithu Joseph <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Reviewed-by: Ashok Raj <[email protected]>
---
drivers/platform/x86/intel/ifs/load.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/platform/x86/intel/ifs/load.c b/drivers/platform/x86/intel/ifs/load.c
index 584c44387e10..39f19cb51749 100644
--- a/drivers/platform/x86/intel/ifs/load.c
+++ b/drivers/platform/x86/intel/ifs/load.c
@@ -233,7 +233,9 @@ static int copy_hashes_authenticate_chunks_gen2(struct device *dev)
chunk_table[0] = starting_chunk_nr + i;
chunk_table[1] = linear_addr;
do {
+ local_irq_disable();
wrmsrl(MSR_AUTHENTICATE_AND_COPY_CHUNK, (u64)chunk_table);
+ local_irq_enable();
rdmsrl(MSR_CHUNKS_AUTHENTICATION_STATUS, chunk_status.data);
err_code = chunk_status.error_code;
} while (err_code == AUTH_INTERRUPTED_ERROR && --retry_count);
--
2.25.1


Subject: Re: [PATCH 1/3] platform/x86/intel/ifs: Classify error scenarios correctly


On 4/12/24 10:23 AM, Jithu Joseph wrote:
> Based on inputs from hardware architects, only "scan signature failures"
> should be treated as actual hardware/cpu failure.

Instead of just saying input from hardware architects, it would be better
if you mention the rationale behind it.

> Current driver, in addition, classifies "scan controller error" scenario
> too as a hardware/cpu failure. Modify the driver to classify this situation
> with a more appropriate "untested" status instead of "fail" status.
>
> Signed-off-by: Jithu Joseph <[email protected]>
> Reviewed-by: Tony Luck <[email protected]>
> Reviewe

Code wise it looks good to me.

Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]>

> d-by: Ashok Raj <[email protected]>
> ---
> drivers/platform/x86/intel/ifs/runtest.c | 27 +++++++++++++-----------
> 1 file changed, 15 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/platform/x86/intel/ifs/runtest.c b/drivers/platform/x86/intel/ifs/runtest.c
> index 95b4b71fab53..282e4bfe30da 100644
> --- a/drivers/platform/x86/intel/ifs/runtest.c
> +++ b/drivers/platform/x86/intel/ifs/runtest.c
> @@ -69,6 +69,19 @@ static const char * const scan_test_status[] = {
>
> static void message_not_tested(struct device *dev, int cpu, union ifs_status status)
> {
> + struct ifs_data *ifsd = ifs_get_data(dev);
> +
> + /*
> + * control_error is set when the microcode runs into a problem
> + * loading the image from the reserved BIOS memory, or it has
> + * been corrupted. Reloading the image may fix this issue.
> + */
> + if (status.control_error) {
> + dev_warn(dev, "CPU(s) %*pbl: Scan controller error. Batch: %02x version: 0x%x\n",
> + cpumask_pr_args(cpu_smt_mask(cpu)), ifsd->cur_batch, ifsd->loaded_version);
> + return;
> + }
> +
> if (status.error_code < ARRAY_SIZE(scan_test_status)) {
> dev_info(dev, "CPU(s) %*pbl: SCAN operation did not start. %s\n",
> cpumask_pr_args(cpu_smt_mask(cpu)),
> @@ -90,16 +103,6 @@ static void message_fail(struct device *dev, int cpu, union ifs_status status)
> {
> struct ifs_data *ifsd = ifs_get_data(dev);
>
> - /*
> - * control_error is set when the microcode runs into a problem
> - * loading the image from the reserved BIOS memory, or it has
> - * been corrupted. Reloading the image may fix this issue.
> - */
> - if (status.control_error) {
> - dev_err(dev, "CPU(s) %*pbl: could not execute from loaded scan image. Batch: %02x version: 0x%x\n",
> - cpumask_pr_args(cpu_smt_mask(cpu)), ifsd->cur_batch, ifsd->loaded_version);
> - }
> -
> /*
> * signature_error is set when the output from the scan chains does not
> * match the expected signature. This might be a transient problem (e.g.
> @@ -285,10 +288,10 @@ static void ifs_test_core(int cpu, struct device *dev)
> /* Update status for this core */
> ifsd->scan_details = status.data;
>
> - if (status.control_error || status.signature_error) {
> + if (status.signature_error) {
> ifsd->status = SCAN_TEST_FAIL;
> message_fail(dev, cpu, status);
> - } else if (status.error_code) {
> + } else if (status.control_error || status.error_code) {
> ifsd->status = SCAN_NOT_TESTED;
> message_not_tested(dev, cpu, status);
> } else {

--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


Subject: Re: [PATCH 2/3] platform/x86/intel/ifs: trace: display batch num in hex


On 4/12/24 10:23 AM, Jithu Joseph wrote:
> In Field Scan test image files are named in ff-mm-ss-<batch02x>.scan
> format. Current trace output, prints the batch number in decimal format.
>
> Make it easier to correlate the trace line to a test image file by
> showing the batch number also in hex format.
>
> Add 0x prefix to all fields in the trace line to make the type explicit.
>
> Signed-off-by: Jithu Joseph <[email protected]>
> Reviewed-by: Tony Luck <[email protected]>
> Reviewed-by: Ashok Raj <[email protected]>
> ---

Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]>

> include/trace/events/intel_ifs.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/trace/events/intel_ifs.h b/include/trace/events/intel_ifs.h
> index 8ce2de120f2d..0d88ebf2c980 100644
> --- a/include/trace/events/intel_ifs.h
> +++ b/include/trace/events/intel_ifs.h
> @@ -28,7 +28,7 @@ TRACE_EVENT(ifs_status,
> __entry->status = status;
> ),
>
> - TP_printk("batch: %.2d, start: %.4x, stop: %.4x, status: %.16llx",
> + TP_printk("batch: 0x%.2x, start: 0x%.4x, stop: 0x%.4x, status: 0x%.16llx",
> __entry->batch,
> __entry->start,
> __entry->stop,

--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


2024-04-12 19:31:21

by Joseph, Jithu

[permalink] [raw]
Subject: Re: [PATCH 1/3] platform/x86/intel/ifs: Classify error scenarios correctly

Sathya,

Thanks for reviewing this

On 4/12/2024 11:32 AM, Kuppuswamy Sathyanarayanan wrote:
>
> On 4/12/24 10:23 AM, Jithu Joseph wrote:
>> Based on inputs from hardware architects, only "scan signature failures"
>> should be treated as actual hardware/cpu failure.
>
> Instead of just saying input from hardware architects, it would be better
> if you mention the rationale behind it.

I can reword the first para as below:

"Scan controller error" means that scan hardware encountered an error
prior to doing an actual test on the target CPU. It does not mean that
there is an actual cpu/core failure. "scan signature failure" indicates
that the test result on the target core did not match the expected value
and should be treated as a cpu failure.

Current driver classifies both these scenarios as failures. Modify ...

>
>> Current driver, in addition, classifies "scan controller error" scenario
>> too as a hardware/cpu failure. Modify the driver to classify this situation
>> with a more appropriate "untested" status instead of "fail" status.
>>
>> Signed-off-by: Jithu Joseph <[email protected]>
>> Reviewed-by: Tony Luck <[email protected]>
>> Reviewe
>
> Code wise it looks good to me.
>
> Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]>
>


Jithu

Subject: Re: [PATCH 1/3] platform/x86/intel/ifs: Classify error scenarios correctly


On 4/12/24 12:31 PM, Joseph, Jithu wrote:
> Sathya,
>
> Thanks for reviewing this
>
> On 4/12/2024 11:32 AM, Kuppuswamy Sathyanarayanan wrote:
>> On 4/12/24 10:23 AM, Jithu Joseph wrote:
>>> Based on inputs from hardware architects, only "scan signature failures"
>>> should be treated as actual hardware/cpu failure.
>> Instead of just saying input from hardware architects, it would be better
>> if you mention the rationale behind it.
> I can reword the first para as below:
>
> "Scan controller error" means that scan hardware encountered an error
> prior to doing an actual test on the target CPU. It does not mean that
> there is an actual cpu/core failure. "scan signature failure" indicates
> that the test result on the target core did not match the expected value
> and should be treated as a cpu failure.
>
> Current driver classifies both these scenarios as failures. Modify ...

Looks good to me.

>>> Current driver, in addition, classifies "scan controller error" scenario
>>> too as a hardware/cpu failure. Modify the driver to classify this situation
>>> with a more appropriate "untested" status instead of "fail" status.
>>>
>>> Signed-off-by: Jithu Joseph <[email protected]>
>>> Reviewed-by: Tony Luck <[email protected]>
>>> Reviewe
>> Code wise it looks good to me.
>>
>> Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]>
>>
>
> Jithu

--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


Subject: Re: [PATCH 3/3] platform/x86/intel/ifs: Disable irq during one load stage


On 4/12/24 10:23 AM, Jithu Joseph wrote:
> One of the stages in IFS image loading process involves loading individual
> chunks (test patterns) from test image file to secure memory.
>
> Driver issues a WRMSR(MSR_AUTHENTICATE_AND_COPY_CHUNK) operation to do
> this. This operation can take up to 5 msec, and if an interrupt occurs
> in between, the AUTH_AND_COPY_CHUNK u-code implementation aborts the
> operation.
>
> Interrupt sources such as NMI or SMI are handled by retrying. Regular
> interrupts may occur frequently enough to prevent this operation from ever
> completing. Disable irq on local cpu around the aforementioned WRMSR to
> allow the operation to complete.
>
> Signed-off-by: Jithu Joseph <[email protected]>
> Reviewed-by: Tony Luck <[email protected]>
> Reviewed-by: Ashok Raj <[email protected]>
> ---


Looks good to me.

Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]>


> drivers/platform/x86/intel/ifs/load.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/platform/x86/intel/ifs/load.c b/drivers/platform/x86/intel/ifs/load.c
> index 584c44387e10..39f19cb51749 100644
> --- a/drivers/platform/x86/intel/ifs/load.c
> +++ b/drivers/platform/x86/intel/ifs/load.c
> @@ -233,7 +233,9 @@ static int copy_hashes_authenticate_chunks_gen2(struct device *dev)
> chunk_table[0] = starting_chunk_nr + i;
> chunk_table[1] = linear_addr;
> do {
> + local_irq_disable();
> wrmsrl(MSR_AUTHENTICATE_AND_COPY_CHUNK, (u64)chunk_table);
> + local_irq_enable();
> rdmsrl(MSR_CHUNKS_AUTHENTICATION_STATUS, chunk_status.data);
> err_code = chunk_status.error_code;
> } while (err_code == AUTH_INTERRUPTED_ERROR && --retry_count);

--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


2024-04-15 15:16:39

by Hans de Goede

[permalink] [raw]
Subject: Re: [PATCH 1/3] platform/x86/intel/ifs: Classify error scenarios correctly

Hi,

Thank you for this patch series.

On 4/12/24 9:31 PM, Joseph, Jithu wrote:
> Sathya,
>
> Thanks for reviewing this
>
> On 4/12/2024 11:32 AM, Kuppuswamy Sathyanarayanan wrote:
>>
>> On 4/12/24 10:23 AM, Jithu Joseph wrote:
>>> Based on inputs from hardware architects, only "scan signature failures"
>>> should be treated as actual hardware/cpu failure.
>>
>> Instead of just saying input from hardware architects, it would be better
>> if you mention the rationale behind it.
>
> I can reword the first para as below:
>
> "Scan controller error" means that scan hardware encountered an error
> prior to doing an actual test on the target CPU. It does not mean that
> there is an actual cpu/core failure. "scan signature failure" indicates
> that the test result on the target core did not match the expected value
> and should be treated as a cpu failure.
>
> Current driver classifies both these scenarios as failures. Modify ...

I've modified the commit message using the rewording suggested above
while merging this patch and I have merged the entire series:

Thank you for your patch-series, I've applied the series to my
review-hans branch:
https://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86.git/log/?h=review-hans

Once I've run some tests on this branch the patches there will be
added to the platform-drivers-x86/for-next branch and eventually
will be included in the pdx86 pull-request to Linus for the next
merge-window.

Regards,

Hans






>>> Current driver, in addition, classifies "scan controller error" scenario
>>> too as a hardware/cpu failure. Modify the driver to classify this situation
>>> with a more appropriate "untested" status instead of "fail" status.
>>>
>>> Signed-off-by: Jithu Joseph <[email protected]>
>>> Reviewed-by: Tony Luck <[email protected]>
>>> Reviewe
>>
>> Code wise it looks good to me.
>>
>> Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]>
>>
>
>
> Jithu
>