The RAS SError Syndrome can be Implementation-Defined,
arm64_is_ras_serror() is used to judge whether it is RAS SError,
but arm64_is_ras_serror() does not include this judgement. In order
to avoid function name confusion, we rename the arm64_is_ras_serror()
to arm64_is_categorized_ras_serror(), this function is used to
judge whether it is categorized RAS Serror.
Change some code notes, unrecoverable RAS errors is imprecise, but
Recoverable RAS errors is precise.
Signed-off-by: Dongjiu Geng <[email protected]>
---
arch/arm64/include/asm/traps.h | 20 ++++++++------------
arch/arm64/kernel/traps.c | 9 +++++----
arch/arm64/kvm/handle_exit.c | 3 ++-
3 files changed, 15 insertions(+), 17 deletions(-)
diff --git a/arch/arm64/include/asm/traps.h b/arch/arm64/include/asm/traps.h
index 178e338..2a02b64 100644
--- a/arch/arm64/include/asm/traps.h
+++ b/arch/arm64/include/asm/traps.h
@@ -72,18 +72,23 @@ static inline int in_entry_text(unsigned long ptr)
* CPUs with the RAS extensions have an Implementation-Defined-Syndrome bit
* to indicate whether this ESR has a RAS encoding. CPUs without this feature
* have a ISS-Valid bit in the same position.
- * If this bit is set, we know its not a RAS SError.
+ * If this bit is set, we know it is not a categorized RAS SError.
* If its clear, we need to know if the CPU supports RAS. Uncategorized RAS
* errors share the same encoding as an all-zeros encoding from a CPU that
* doesn't support RAS.
*/
-static inline bool arm64_is_ras_serror(u32 esr)
+static inline bool arm64_is_categorized_ras_serror(u32 esr)
{
WARN_ON(preemptible());
+ /* This is Implementation-Defined Syndrome */
if (esr & ESR_ELx_IDS)
return false;
+ if ((esr & ESR_ELx_FSC) != ESR_ELx_FSC_SERROR)
+ /* No severity information : Uncategorized */
+ return false;
+
if (this_cpu_has_cap(ARM64_HAS_RAS_EXTN))
return true;
else
@@ -101,20 +106,11 @@ static inline u32 arm64_ras_serror_get_severity(u32 esr)
{
u32 aet = esr & ESR_ELx_AET;
- if (!arm64_is_ras_serror(esr)) {
+ if (!arm64_is_categorized_ras_serror(esr)) {
/* Not a RAS error, we can't interpret the ESR. */
return ESR_ELx_AET_UC;
}
- /*
- * AET is RES0 if 'the value returned in the DFSC field is not
- * [ESR_ELx_FSC_SERROR]'
- */
- if ((esr & ESR_ELx_FSC) != ESR_ELx_FSC_SERROR) {
- /* No severity information : Uncategorized */
- return ESR_ELx_AET_UC;
- }
-
return aet;
}
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index bbb0fde..e88096a 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -689,12 +689,12 @@ bool arm64_is_fatal_ras_serror(struct pt_regs *regs, unsigned int esr)
* a more severe error.
*/
return false;
-
+ /* The exception has been imprecise */
case ESR_ELx_AET_UEU: /* Uncorrected Unrecoverable */
+ /* The exception is precise */
case ESR_ELx_AET_UER: /* Uncorrected Recoverable */
/*
- * The CPU can't make progress. The exception may have
- * been imprecise.
+ * The CPU can't make progress.
*/
return true;
@@ -710,7 +710,8 @@ asmlinkage void do_serror(struct pt_regs *regs, unsigned int esr)
nmi_enter();
/* non-RAS errors are not containable */
- if (!arm64_is_ras_serror(esr) || arm64_is_fatal_ras_serror(regs, esr))
+ if (!arm64_is_categorized_ras_serror(esr) ||
+ arm64_is_fatal_ras_serror(regs, esr))
arm64_serror_panic(regs, esr);
nmi_exit();
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index e5e741b..913c19e 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -40,7 +40,8 @@
static void kvm_handle_guest_serror(struct kvm_vcpu *vcpu, u32 esr)
{
- if (!arm64_is_ras_serror(esr) || arm64_is_fatal_ras_serror(NULL, esr))
+ if (!arm64_is_categorized_ras_serror(esr) ||
+ arm64_is_fatal_ras_serror(NULL, esr))
kvm_inject_vabt(vcpu);
}
--
1.9.1
Hi Dongjiu Geng,
On 22/02/18 18:02, Dongjiu Geng wrote:
> The RAS SError Syndrome can be Implementation-Defined,
> arm64_is_ras_serror() is used to judge whether it is RAS SError,
> but arm64_is_ras_serror() does not include this judgement. In order
> to avoid function name confusion, we rename the arm64_is_ras_serror()
> to arm64_is_categorized_ras_serror(), this function is used to
> judge whether it is categorized RAS Serror.
I don't see how 'categorized' is relevant. The most significant ISS bit is used
to determine if this is an IMP-DEF ESR, or one that uses the architected layout.
DFSC of zero means uncategorised. What should we do with an uncategorised SError?
From 2.4.3 "ESB and other physical errors" of [0]
| It is IMPLEMENTATION DEFINED whether [..]uncategorized SError interrupts
| are containable or Uncontainable, and whether they can be synchronized by an
| Error Synchronization Barrier.
We treat uncategorized as uncontainable as that's the worst thing it could be.
On aarch32 uncontainable and uncategorized even share an encoding.
(AET:00 in G7.2.43 "DFSR, Data Fault Status Register", 'state of the PE after
taking the SError interrupt exception')
> Change some code notes, unrecoverable RAS errors is imprecise, but
> Recoverable RAS errors is precise.
Unrecoverable and Recoverable (but we don't know where it is) are both grouped
into 'can't make progress'. The comment says:
| The exception may have been imprecise.
because one of those two is imprecise.
If anyone cares which one is which, they can read the spec.
I expect this code will one day be replaced with proper kernel-first support,
its only here to avoid panic()ing due to corrected errors.
Thanks,
James
[0]
https://static.docs.arm.com/ddi0587/a/RAS%20Extension-release%20candidate_march_29.pdf
Hi James,
Thanks a lot for your review.
2018-02-24 1:58 GMT+08:00 James Morse <[email protected]>:
> Hi Dongjiu Geng,
>
> On 22/02/18 18:02, Dongjiu Geng wrote:
>> The RAS SError Syndrome can be Implementation-Defined,
>> arm64_is_ras_serror() is used to judge whether it is RAS SError,
>> but arm64_is_ras_serror() does not include this judgement. In order
>> to avoid function name confusion, we rename the arm64_is_ras_serror()
>> to arm64_is_categorized_ras_serror(), this function is used to
>> judge whether it is categorized RAS Serror.
>
> I don't see how 'categorized' is relevant. The most significant ISS bit is used
> to determine if this is an IMP-DEF ESR, or one that uses the architected layout.
From the name arm64_is_ras_serror(), it used to judge whether this is
RAS Serror,
but arm64_is_ras_serror() think the IMP-DEF SError is not RAS SError,
as shown the code note
and code in[1]. In fact the IMP-DEF SError is also RAS SError, so when
I read the code, it looks like
confusion, so I rename it to arm64_is_categorized_ras_serror(), then
this function is only used to
judge whether this is categorized RAS SError,
if it is categorized, the code will continue judge its Asynchronous Error Type.
if it is uncategorized, the code will panic(this is the original code
logic) or not panic when we support kernel-first or can isolate the
SError
[1]:
/*
* ...................................................
*CPUs with the RAS extensions have an Implementation-Defined-Syndrome bit
- * If this bit is set, we know its not a RAS SError.
* ...............................................................................
*/
static inline bool arm64_is_ras_serror(u32 esr)
{
WARN_ON(preemptible());
if (esr & ESR_ELx_IDS)
return false;
.........................
}
>
> DFSC of zero means uncategorised. What should we do with an uncategorised SError?
yes, so it returns false because it is uncategorised.
>
> From 2.4.3 "ESB and other physical errors" of [0]
> | It is IMPLEMENTATION DEFINED whether [..]uncategorized SError interrupts
> | are containable or Uncontainable, and whether they can be synchronized by an
> | Error Synchronization Barrier.
>
> We treat uncategorized as uncontainable as that's the worst thing it could be.
in original code, system will panic if it is uncategorized, I do not
change this logic.
I only rename this function name to make it less confusion
it no need to panic for uncontainable if we can isolate this error.
>
> On aarch32 uncontainable and uncategorized even share an encoding.
> (AET:00 in G7.2.43 "DFSR, Data Fault Status Register", 'state of the PE after
> taking the SError interrupt exception')
>
>
>> Change some code notes, unrecoverable RAS errors is imprecise, but
>> Recoverable RAS errors is precise.
>
> Unrecoverable and Recoverable (but we don't know where it is) are both grouped
> into 'can't make progress'. The comment says:
> | The exception may have been imprecise.
> because one of those two is imprecise.
>
> If anyone cares which one is which, they can read the spec.
>
> I expect this code will one day be replaced with proper kernel-first support,
> its only here to avoid panic()ing due to corrected errors.
Ok, thanks for the explanation.
>
>
> Thanks,
>
> James
>
> [0]
> https://static.docs.arm.com/ddi0587/a/RAS%20Extension-release%20candidate_march_29.pdf
Hi gengdongjiu,
On 26/02/18 16:13, gengdongjiu wrote:
> 2018-02-24 1:58 GMT+08:00 James Morse <[email protected]>:
>> On 22/02/18 18:02, Dongjiu Geng wrote:
>>> The RAS SError Syndrome can be Implementation-Defined,
>>> arm64_is_ras_serror() is used to judge whether it is RAS SError,
>>> but arm64_is_ras_serror() does not include this judgement. In order
>>> to avoid function name confusion, we rename the arm64_is_ras_serror()
>>> to arm64_is_categorized_ras_serror(), this function is used to
>>> judge whether it is categorized RAS Serror.
>>
>> I don't see how 'categorized' is relevant. The most significant ISS bit is used
>> to determine if this is an IMP-DEF ESR, or one that uses the architected layout.
>
> From the name arm64_is_ras_serror(), it used to judge whether this is
> RAS Serror,
> but arm64_is_ras_serror() think the IMP-DEF SError is not RAS SError,
> as shown the code note and code in[1].
> In fact the IMP-DEF SError is also RAS SError, so when I read the
> code, it looks like
This is just you then. No-one else has your imp-def:RAS error ESR values.
This would be like me adding some impdef branch instruction, then claiming
aarch64_insn_is_branch() doesn't take account of my private additions.
I agree the name is assuming all architected ESR are RAS-errors, and that impdef
ESR are just that: impdef, that's all we know about them. Unless this causes us
to do the wrong thing, I don't think it matters.
Obviously we would need to change it if a new architected ESR is added.
> confusion, so I rename it to arm64_is_categorized_ras_serror(), then
This is actually worse, because there is an architected ESR for 'uncategorized',
that the helper papers-over and treats as uncontained. Calling it 'categorized'
means we now have three states, not two.
> this function is only used to
> judge whether this is categorized RAS SError,
> if it is categorized, the code will continue judge its Asynchronous Error Type.
> if it is uncategorized, the code will panic(this is the original code
> logic) or not panic when we support kernel-first or can isolate the
> SError
Thanks,
James
Hi James,
> Hi gengdongjiu,
>
> On 26/02/18 16:13, gengdongjiu wrote:
> > 2018-02-24 1:58 GMT+08:00 James Morse <[email protected]>:
> >> On 22/02/18 18:02, Dongjiu Geng wrote:
> >>> The RAS SError Syndrome can be Implementation-Defined,
> >>> arm64_is_ras_serror() is used to judge whether it is RAS SError, but
> >>> arm64_is_ras_serror() does not include this judgement. In order to
> >>> avoid function name confusion, we rename the arm64_is_ras_serror()
> >>> to arm64_is_categorized_ras_serror(), this function is used to judge
> >>> whether it is categorized RAS Serror.
> >>
> >> I don't see how 'categorized' is relevant. The most significant ISS
> >> bit is used to determine if this is an IMP-DEF ESR, or one that uses the architected layout.
> >
> > From the name arm64_is_ras_serror(), it used to judge whether this is
> > RAS Serror, but arm64_is_ras_serror() think the IMP-DEF SError is not
> > RAS SError, as shown the code note and code in[1].
>
> > In fact the IMP-DEF SError is also RAS SError, so when I read the
> > code, it looks like
>
> This is just you then. No-one else has your imp-def:RAS error ESR values.
>
> This would be like me adding some impdef branch instruction, then claiming
> aarch64_insn_is_branch() doesn't take account of my private additions.
>
> I agree the name is assuming all architected ESR are RAS-errors, and that impdef ESR are just that: impdef, that's all we know about them.
> Unless this causes us to do the wrong thing, I don't think it matters.
> Obviously we would need to change it if a new architected ESR is added.
Ok, let us keep the current code and not change it until a new architected ESR is added