by Chen, Gong

[permalink] [raw]

Subject: Re: [PATCH v3 8/9] ACPI, APEI, CPER: Cleanup CPER memory error output format

On Fri, Oct 18, 2013 at 05:31:21PM +0530, Naveen N. Rao wrote:
> Date: Fri, 18 Oct 2013 17:31:21 +0530
> From: "Naveen N. Rao" <[email protected]>
> To: "Chen, Gong" <[email protected]>, [email protected],
> [email protected], [email protected], [email protected]
> CC: [email protected], [email protected],
> [email protected]
> Subject: Re: [PATCH v3 8/9] ACPI, APEI, CPER: Cleanup CPER memory error
> output format
> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101
> Thunderbird/24.0
>
[...]
> >
> >@@ -358,17 +349,21 @@ void cper_estatus_print(const char *pfx,
> > struct acpi_generic_data *gdata;
> > unsigned int data_len, gedata_len;
> > int sec_no = 0;
> >+ char newpfx[64];
> > __u16 severity;
> >
> >- printk("%s""Generic Hardware Error Status\n", pfx);
> > severity = estatus->error_severity;
> >- printk("%s""severity: %d, %s\n", pfx, severity,
> >- cper_severity_str(severity));
> >+ if (severity != CPER_SEV_FATAL)
>
> Shouldn't this just be (severity == CPER_SEV_CORRECTED)?
>
> Thanks,
> Naveen
>
IMO, only fatal error can't be handlered gracefully in current
kernel plus H/W. Once it can be recovered by H/W and OS, we
can call it recovered.

Attachments:

(No filename) (1.22 kB)
signature.asc (836.00 B)
Digital signature Download all attachments

2013-10-19 11:46:27

by Chen, Gong

[permalink] [raw]

Subject: Re: [PATCH v3 4/9] ACPI, x86: Extended error log driver for x86 platform

On Fri, Oct 18, 2013 at 06:07:56PM +0530, Naveen N. Rao wrote:
> Date: Fri, 18 Oct 2013 18:07:56 +0530
> From: "Naveen N. Rao" <[email protected]>
> To: "Chen, Gong" <[email protected]>, [email protected],
> [email protected], [email protected], [email protected]
> CC: [email protected], [email protected],
> [email protected]
> Subject: Re: [PATCH v3 4/9] ACPI, x86: Extended error log driver for x86
> platform
> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101
> Thunderbird/24.0
>
[...]
> >+
> >+MODULE_AUTHOR("Chen, Gong <[email protected]>");
> >+MODULE_DESCRIPTION("Extended Error Log Driver");
>
> "Extended MCA Error Log Driver"?
>

Looks fine to me. Tony, would you please help to fix it when you pick up the
patch? Thanks in advance!

Attachments:

(No filename) (816.00 B)
signature.asc (836.00 B)
Digital signature Download all attachments

2013-10-20 07:21:29

On 2014/6/28 6:10, Luck, Tony wrote:
>>> Not all machine checks are fatal - it would be bad for us to go into
>>> an infinite spin instead of executing the recovery code.
>>
>> Then for the time being extlog shouldn't hook into the decoder chain
>> but into mce_process_work, i.e. the last should call it. Or maybe add
>> another notifier which is not atomic...
>
> I spoke too quickly. The only MCE for which we have recovery code are
> those that hit in application code. So the processor that is trying to do
> the printk() can't possibly be holding the locks. Other processors might
> have held the lock at the time of the MCE - but they have all returned
> from the handler at the time we try the printk - so they will make progess
> and release the lock so that we can acquire it.

Thank you for your reply.

When we got a MCE which hit in application code, it will be broadcast to
other processors immediately. Other processors who might have held the lock
at the time of MCE, have no chance to release the lock and return from the
printk. Isn't it?

I know this rarely happens in production environments, but I think it's still
a risk here. So it's very good if we have a printk safe in atomic context in
the future.

--
Thanks,
XiuQi

>
> -Tony
>