This series of patches adds support for extended physical address on newer
AMD processors such as AMD 'Milan'. It also, suggests to cache
MCA_CONFIG[McaX] similar to MCA_CONFIG[McaLsbInStatusSupported] to avoid
reading MCA_CONFIG register each time.
The first patch defines a separate helper function to extract
MCA_ADDR[ErrorAddr].
The second patch adds support for extended ErrorAddr bits in MCA_ADDR.
Third patch, caches MCA_CONFIG[McaX] similar to
MCA_CONFIG[McaLsbInStatusSupported] in the existing mce_bank struct.
Last patch, fixes unnecessary padding in mce_bank struct.
Link:
https://lkml.kernel.org/r/[email protected]
Smita Koralahalli (4):
x86/mce: Define function to extract ErrorAddr from MCA_ADDR
x86/mce: Add support for Extended Physical Address MCA changes
x86/mce, EDAC/mce_amd: Cache MCA_CONFIG[McaX] in struct mce_bank
x86/mce: Avoid unnecessary padding in struct mce_bank
arch/x86/include/asm/mce.h | 14 +++----
arch/x86/kernel/cpu/mce/amd.c | 59 +++++++++++++++++++++++-------
arch/x86/kernel/cpu/mce/core.c | 20 +++-------
arch/x86/kernel/cpu/mce/internal.h | 26 +++++++++++++
arch/x86/kernel/cpu/mce/severity.c | 6 +--
drivers/edac/mce_amd.c | 5 +--
6 files changed, 86 insertions(+), 44 deletions(-)
--
2.17.1
Newer AMD processors such as AMD 'Milan' support more physical address
bits.
That is the MCA_ADDR registers on Scalable MCA systems contain the
ErrorAddr in bits [56:0] instead of [55:0]. Hence the existing LSB field
from bits [61:56] in MCA_ADDR must be moved around to accommodate the
larger ErrorAddr size.
MCA_CONFIG[McaLsbInStatusSupported] indicates this change. If set, the
LSB field will be found in MCA_STATUS rather than MCA_ADDR.
Each logical CPU has unique MCA bank in hardware and is not shared with
other logical CPUs. Additionally on SMCA systems, each feature bit may be
different for each bank within same logical CPU.
Check for MCA_CONFIG[McaLsbInStatusSupported] for each MCA bank and for
each CPU.
Signed-off-by: Smita Koralahalli <[email protected]>
Reviewed-by: Yazen Ghannam <[email protected]>
---
Link:
https://lkml.kernel.org/r/[email protected]
v2:
Declared lsb_in_status in existing mce_bank[] struct.
Moved struct mce_bank[] declaration from core.c -> internal.h
v3:
Rebased on the latest tip tree. No functional changes.
---
arch/x86/include/asm/mce.h | 2 ++
arch/x86/kernel/cpu/mce/amd.c | 25 +++++++++++++++++++------
arch/x86/kernel/cpu/mce/core.c | 13 ++++---------
arch/x86/kernel/cpu/mce/internal.h | 14 ++++++++++++++
4 files changed, 39 insertions(+), 15 deletions(-)
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 99a4c32cbdfa..cc67c74e8b46 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -338,6 +338,7 @@ extern int mce_threshold_remove_device(unsigned int cpu);
void mce_amd_feature_init(struct cpuinfo_x86 *c);
enum smca_bank_types smca_get_bank_type(unsigned int cpu, unsigned int bank);
void smca_extract_err_addr(struct mce *m);
+void smca_feature_init(void);
#else
static inline int mce_threshold_create_device(unsigned int cpu) { return 0; };
@@ -345,6 +346,7 @@ static inline int mce_threshold_remove_device(unsigned int cpu) { return 0; };
static inline bool amd_mce_is_memory_error(struct mce *m) { return false; };
static inline void mce_amd_feature_init(struct cpuinfo_x86 *c) { }
static inline void smca_extract_err_addr(struct mce *m) { }
+static inline void smca_feature_init(void) { }
#endif
static inline void mce_hygon_feature_init(struct cpuinfo_x86 *c) { return mce_amd_feature_init(c); }
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 981d718851a2..ed75d4bd2aff 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -724,9 +724,26 @@ bool amd_mce_is_memory_error(struct mce *m)
void smca_extract_err_addr(struct mce *m)
{
- u8 lsb = (m->addr >> 56) & 0x3f;
+ if (this_cpu_ptr(mce_banks_array)[m->bank].lsb_in_status) {
+ u8 lsb = (m->status >> 24) & 0x3f;
- m->addr &= GENMASK_ULL(55, lsb);
+ m->addr &= GENMASK_ULL(56, lsb);
+ } else {
+ u8 lsb = (m->addr >> 56) & 0x3f;
+
+ m->addr &= GENMASK_ULL(55, lsb);
+ }
+}
+
+void smca_feature_init(void)
+{
+ unsigned int bank;
+ u64 mca_cfg;
+
+ for (bank = 0; bank < this_cpu_read(mce_num_banks); ++bank) {
+ rdmsrl(MSR_AMD64_SMCA_MCx_CONFIG(bank), mca_cfg);
+ this_cpu_ptr(mce_banks_array)[bank].lsb_in_status = !!(mca_cfg & BIT(8));
+ }
}
static void __log_error(unsigned int bank, u64 status, u64 addr, u64 misc)
@@ -743,10 +760,6 @@ static void __log_error(unsigned int bank, u64 status, u64 addr, u64 misc)
if (m.status & MCI_STATUS_ADDRV) {
m.addr = addr;
- /*
- * Extract [55:<lsb>] where lsb is the least significant
- * *valid* bit of the address bits.
- */
if (mce_flags.smca)
smca_extract_err_addr(&m);
}
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index f031f2668523..92adce850488 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -67,11 +67,7 @@ DEFINE_PER_CPU(unsigned, mce_exception_count);
DEFINE_PER_CPU_READ_MOSTLY(unsigned int, mce_num_banks);
-struct mce_bank {
- u64 ctl; /* subevents to enable */
- bool init; /* initialise bank? */
-};
-static DEFINE_PER_CPU_READ_MOSTLY(struct mce_bank[MAX_NR_BANKS], mce_banks_array);
+DEFINE_PER_CPU_READ_MOSTLY(struct mce_bank[MAX_NR_BANKS], mce_banks_array);
#define ATTR_LEN 16
/* One object for each MCE bank, shared by all CPUs */
@@ -660,10 +656,6 @@ static noinstr void mce_read_aux(struct mce *m, int i)
m->addr <<= shift;
}
- /*
- * Extract [55:<lsb>] where lsb is the least significant
- * *valid* bit of the address bits.
- */
if (mce_flags.smca)
smca_extract_err_addr(m);
}
@@ -1902,6 +1894,9 @@ static void __mcheck_cpu_init_early(struct cpuinfo_x86 *c)
mce_flags.succor = !!cpu_has(c, X86_FEATURE_SUCCOR);
mce_flags.smca = !!cpu_has(c, X86_FEATURE_SMCA);
mce_flags.amd_threshold = 1;
+
+ if (mce_flags.smca)
+ smca_feature_init();
}
}
diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h
index 52c633950b38..39dc37052cb9 100644
--- a/arch/x86/kernel/cpu/mce/internal.h
+++ b/arch/x86/kernel/cpu/mce/internal.h
@@ -175,6 +175,20 @@ struct mce_vendor_flags {
extern struct mce_vendor_flags mce_flags;
+struct mce_bank {
+ u64 ctl; /* subevents to enable */
+ bool init; /* initialise bank? */
+
+ /*
+ * (AMD) MCA_CONFIG[McaLsbInStatusSupported]: This bit indicates
+ * the LSB field is found in MCA_STATUS, when set.
+ */
+ __u64 lsb_in_status : 1,
+ __reserved_1 : 63;
+};
+
+DECLARE_PER_CPU_READ_MOSTLY(struct mce_bank[MAX_NR_BANKS], mce_banks_array);
+
enum mca_msr {
MCA_CTL,
MCA_STATUS,
--
2.17.1
Move MCA_ADDR[ErrorAddr] extraction into a separate helper function. This
will be further refactored in the next patch.
Signed-off-by: Smita Koralahalli <[email protected]>
Reviewed-by: Yazen Ghannam <[email protected]>
---
Link:
https://lkml.kernel.org/r/[email protected]
v2:
No change.
v3:
Rebased on the latest tip tree. No functional changes.
---
arch/x86/include/asm/mce.h | 2 ++
arch/x86/kernel/cpu/mce/amd.c | 14 +++++++++-----
arch/x86/kernel/cpu/mce/core.c | 7 ++-----
3 files changed, 13 insertions(+), 10 deletions(-)
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index cc73061e7255..99a4c32cbdfa 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -337,12 +337,14 @@ extern int mce_threshold_remove_device(unsigned int cpu);
void mce_amd_feature_init(struct cpuinfo_x86 *c);
enum smca_bank_types smca_get_bank_type(unsigned int cpu, unsigned int bank);
+void smca_extract_err_addr(struct mce *m);
#else
static inline int mce_threshold_create_device(unsigned int cpu) { return 0; };
static inline int mce_threshold_remove_device(unsigned int cpu) { return 0; };
static inline bool amd_mce_is_memory_error(struct mce *m) { return false; };
static inline void mce_amd_feature_init(struct cpuinfo_x86 *c) { }
+static inline void smca_extract_err_addr(struct mce *m) { }
#endif
static inline void mce_hygon_feature_init(struct cpuinfo_x86 *c) { return mce_amd_feature_init(c); }
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 1940d305db1c..981d718851a2 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -722,6 +722,13 @@ bool amd_mce_is_memory_error(struct mce *m)
return m->bank == 4 && xec == 0x8;
}
+void smca_extract_err_addr(struct mce *m)
+{
+ u8 lsb = (m->addr >> 56) & 0x3f;
+
+ m->addr &= GENMASK_ULL(55, lsb);
+}
+
static void __log_error(unsigned int bank, u64 status, u64 addr, u64 misc)
{
struct mce m;
@@ -740,11 +747,8 @@ static void __log_error(unsigned int bank, u64 status, u64 addr, u64 misc)
* Extract [55:<lsb>] where lsb is the least significant
* *valid* bit of the address bits.
*/
- if (mce_flags.smca) {
- u8 lsb = (m.addr >> 56) & 0x3f;
-
- m.addr &= GENMASK_ULL(55, lsb);
- }
+ if (mce_flags.smca)
+ smca_extract_err_addr(&m);
}
if (mce_flags.smca) {
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 4f1e825033ce..f031f2668523 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -664,11 +664,8 @@ static noinstr void mce_read_aux(struct mce *m, int i)
* Extract [55:<lsb>] where lsb is the least significant
* *valid* bit of the address bits.
*/
- if (mce_flags.smca) {
- u8 lsb = (m->addr >> 56) & 0x3f;
-
- m->addr &= GENMASK_ULL(55, lsb);
- }
+ if (mce_flags.smca)
+ smca_extract_err_addr(m);
}
if (mce_flags.smca) {
--
2.17.1
Cache the value of MCA_CONFIG[McaX] in the existing mce_bank struct
similar to MCA_CONFIG[McaLsbInStatusSupported].
This simplifies and eliminates the need to read MCA_CONFIG register each
time to check McaX.
Signed-off-by: Smita Koralahalli <[email protected]>
Reviewed-by: Yazen Ghannam <[email protected]>
---
arch/x86/include/asm/mce.h | 10 ++--------
arch/x86/kernel/cpu/mce/amd.c | 24 ++++++++++++++++++++----
arch/x86/kernel/cpu/mce/internal.h | 13 ++++++++++++-
arch/x86/kernel/cpu/mce/severity.c | 6 +-----
drivers/edac/mce_amd.c | 5 +----
5 files changed, 36 insertions(+), 22 deletions(-)
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index cc67c74e8b46..bb2d45b0bb89 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -50,14 +50,6 @@
#define MCI_STATUS_POISON BIT_ULL(43) /* access poisonous data */
#define MCI_STATUS_SCRUB BIT_ULL(40) /* Error detected during scrub operation */
-/*
- * McaX field if set indicates a given bank supports MCA extensions:
- * - Deferred error interrupt type is specifiable by bank.
- * - MCx_MISC0[BlkPtr] field indicates presence of extended MISC registers,
- * But should not be used to determine MSR numbers.
- * - TCC bit is present in MCx_STATUS.
- */
-#define MCI_CONFIG_MCAX 0x1
#define MCI_IPID_MCATYPE 0xFFFF0000
#define MCI_IPID_HWID 0xFFF
@@ -339,6 +331,7 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c);
enum smca_bank_types smca_get_bank_type(unsigned int cpu, unsigned int bank);
void smca_extract_err_addr(struct mce *m);
void smca_feature_init(void);
+bool smca_config_get_mcax(struct mce *m);
#else
static inline int mce_threshold_create_device(unsigned int cpu) { return 0; };
@@ -347,6 +340,7 @@ static inline bool amd_mce_is_memory_error(struct mce *m) { return false; };
static inline void mce_amd_feature_init(struct cpuinfo_x86 *c) { }
static inline void smca_extract_err_addr(struct mce *m) { }
static inline void smca_feature_init(void) { }
+static inline bool smca_config_get_mcax(struct mce *m) { return false; };
#endif
static inline void mce_hygon_feature_init(struct cpuinfo_x86 *c) { return mce_amd_feature_init(c); }
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index ed75d4bd2aff..4ec644611f33 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -257,10 +257,7 @@ static void smca_set_misc_banks_map(unsigned int bank, unsigned int cpu)
* For SMCA enabled processors, BLKPTR field of the first MISC register
* (MCx_MISC0) indicates presence of additional MISC regs set (MISC1-4).
*/
- if (rdmsr_safe(MSR_AMD64_SMCA_MCx_CONFIG(bank), &low, &high))
- return;
-
- if (!(low & MCI_CONFIG_MCAX))
+ if (!(this_cpu_ptr(mce_banks_array)[bank].mcax))
return;
if (rdmsr_safe(MSR_AMD64_SMCA_MCx_MISC(bank), &low, &high))
@@ -735,6 +732,24 @@ void smca_extract_err_addr(struct mce *m)
}
}
+bool smca_config_get_mcax(struct mce *m)
+{
+ struct mce_bank *mce_banks;
+
+ /*
+ * Check if the bank number is valid. It could be possible for the
+ * user to input unavailable bank number when doing SW error injection
+ * or to get an invalid bank number like with APEI memory handling.
+ */
+ if (m->bank >= per_cpu(mce_num_banks, m->extcpu))
+ return false;
+
+ mce_banks = per_cpu_ptr(mce_banks_array, m->extcpu);
+
+ return mce_banks[m->bank].mcax;
+}
+EXPORT_SYMBOL_GPL(smca_config_get_mcax);
+
void smca_feature_init(void)
{
unsigned int bank;
@@ -743,6 +758,7 @@ void smca_feature_init(void)
for (bank = 0; bank < this_cpu_read(mce_num_banks); ++bank) {
rdmsrl(MSR_AMD64_SMCA_MCx_CONFIG(bank), mca_cfg);
this_cpu_ptr(mce_banks_array)[bank].lsb_in_status = !!(mca_cfg & BIT(8));
+ this_cpu_ptr(mce_banks_array)[bank].mcax = !!(mca_cfg & BIT(0));
}
}
diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h
index 39dc37052cb9..422387f8699d 100644
--- a/arch/x86/kernel/cpu/mce/internal.h
+++ b/arch/x86/kernel/cpu/mce/internal.h
@@ -184,7 +184,18 @@ struct mce_bank {
* the LSB field is found in MCA_STATUS, when set.
*/
__u64 lsb_in_status : 1,
- __reserved_1 : 63;
+
+ /*
+ * (AMD) MCA_CONFIG[McaX]: This bit when set indicates a given
+ * bank supports MCA extensions:
+ * - Deferred error interrupt type is specifiable by bank.
+ * - MCx_MISC0[BlkPtr] field indicates presence of extended MISC registers,
+ * But should not be used to determine MSR numbers.
+ * - TCC bit is present in MCx_STATUS.
+ */
+ mcax : 1,
+
+ __reserved_1 : 62;
};
DECLARE_PER_CPU_READ_MOSTLY(struct mce_bank[MAX_NR_BANKS], mce_banks_array);
diff --git a/arch/x86/kernel/cpu/mce/severity.c b/arch/x86/kernel/cpu/mce/severity.c
index 7aa2bda93cbb..95030a162f61 100644
--- a/arch/x86/kernel/cpu/mce/severity.c
+++ b/arch/x86/kernel/cpu/mce/severity.c
@@ -303,8 +303,6 @@ static noinstr int error_context(struct mce *m, struct pt_regs *regs)
static int mce_severity_amd_smca(struct mce *m, enum context err_ctx)
{
- u64 mcx_cfg;
-
/*
* We need to look at the following bits:
* - "succor" bit (data poisoning support), and
@@ -314,10 +312,8 @@ static int mce_severity_amd_smca(struct mce *m, enum context err_ctx)
if (!mce_flags.succor)
return MCE_PANIC_SEVERITY;
- mcx_cfg = mce_rdmsrl(MSR_AMD64_SMCA_MCx_CONFIG(m->bank));
-
/* TCC (Task context corrupt). If set and if IN_KERNEL, panic. */
- if ((mcx_cfg & MCI_CONFIG_MCAX) &&
+ if ((smca_config_get_mcax(m)) &&
(m->status & MCI_STATUS_TCC) &&
(err_ctx == IN_KERNEL))
return MCE_PANIC_SEVERITY;
diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
index cc5c63feb26a..415201ce35c7 100644
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -1254,11 +1254,8 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
((m->status & MCI_STATUS_PCC) ? "PCC" : "-"));
if (boot_cpu_has(X86_FEATURE_SMCA)) {
- u32 low, high;
- u32 addr = MSR_AMD64_SMCA_MCx_CONFIG(m->bank);
- if (!rdmsr_safe(addr, &low, &high) &&
- (low & MCI_CONFIG_MCAX))
+ if (smca_config_get_mcax(m))
pr_cont("|%s", ((m->status & MCI_STATUS_TCC) ? "TCC" : "-"));
pr_cont("|%s", ((m->status & MCI_STATUS_SYNDV) ? "SyndV" : "-"));
--
2.17.1
Include struct mce_bank member "init" in the bitfield by changing its type
from bool to get rid of unnecessary padding and to reduce the overall
struct size.
Outputs collected before and after the change.
$ pahole -C mce_bank arch/x86/kernel/cpu/mce/core.o
before:
/* size: 24, cachelines: 1, members: 5 */
/* bit holes: 1, sum bit holes: 62 bits */
/* bit_padding: 2 bits */
/* last cacheline: 24 bytes */
after:
/* size: 16, cachelines: 1, members: 5 */
/* last cacheline: 16 bytes */
No functional changes.
Signed-off-by: Smita Koralahalli <[email protected]>
Reviewed-by: Yazen Ghannam <[email protected]>
---
arch/x86/kernel/cpu/mce/internal.h | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h
index 422387f8699d..0b0f55f0c585 100644
--- a/arch/x86/kernel/cpu/mce/internal.h
+++ b/arch/x86/kernel/cpu/mce/internal.h
@@ -177,13 +177,14 @@ extern struct mce_vendor_flags mce_flags;
struct mce_bank {
u64 ctl; /* subevents to enable */
- bool init; /* initialise bank? */
+
+ __u64 init : 1, /* initialise bank? */
/*
* (AMD) MCA_CONFIG[McaLsbInStatusSupported]: This bit indicates
* the LSB field is found in MCA_STATUS, when set.
*/
- __u64 lsb_in_status : 1,
+ lsb_in_status : 1,
/*
* (AMD) MCA_CONFIG[McaX]: This bit when set indicates a given
@@ -195,7 +196,7 @@ struct mce_bank {
*/
mcax : 1,
- __reserved_1 : 62;
+ __reserved_1 : 61;
};
DECLARE_PER_CPU_READ_MOSTLY(struct mce_bank[MAX_NR_BANKS], mce_banks_array);
--
2.17.1
On Fri, Feb 11, 2022 at 04:34:41PM -0600, Smita Koralahalli wrote:
> Cache the value of MCA_CONFIG[McaX] in the existing mce_bank struct
> similar to MCA_CONFIG[McaLsbInStatusSupported].
>
> This simplifies and eliminates the need to read MCA_CONFIG register each
> time to check McaX.
I don't see the point for this change, frankly.
I doubt it is speed because those are not really hot paths.
Code savings ain't either: 5 files changed, 36 insertions(+), 22 deletions(-)
Having yet another exported function to modules if not really necessary
doesn't make it prettier too.
So if there's no point for it, you can simply drop it.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Fri, Feb 11, 2022 at 04:34:42PM -0600, Smita Koralahalli wrote:
> Include struct mce_bank member "init" in the bitfield by changing its type
> from bool to get rid of unnecessary padding and to reduce the overall
> struct size.
>
> Outputs collected before and after the change.
>
> $ pahole -C mce_bank arch/x86/kernel/cpu/mce/core.o
>
> before:
>
> /* size: 24, cachelines: 1, members: 5 */
> /* bit holes: 1, sum bit holes: 62 bits */
> /* bit_padding: 2 bits */
> /* last cacheline: 24 bytes */
>
> after:
>
> /* size: 16, cachelines: 1, members: 5 */
> /* last cacheline: 16 bytes */
I don't mind cleanups like that but when you send them as part of a set
adding new functionality, the usual rule is to put bug fixes, cleanups,
improvements, etc to the existing code *first*, and then, ontop you add
your new code.
IOW, this patch should be first in your set.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 2/22/22 9:35 AM, Borislav Petkov wrote:
> On Fri, Feb 11, 2022 at 04:34:41PM -0600, Smita Koralahalli wrote:
>> Cache the value of MCA_CONFIG[McaX] in the existing mce_bank struct
>> similar to MCA_CONFIG[McaLsbInStatusSupported].
>>
>> This simplifies and eliminates the need to read MCA_CONFIG register each
>> time to check McaX.
> I don't see the point for this change, frankly.
>
> I doubt it is speed because those are not really hot paths.
>
> Code savings ain't either: 5 files changed, 36 insertions(+), 22 deletions(-)
>
> Having yet another exported function to modules if not really necessary
> doesn't make it prettier too.
>
> So if there's no point for it, you can simply drop it.
>
> Thx.
>
Hmm okay. The main thought to come up with this patch was of course speed.
The speed might not matter much when we are trying to configure registers
as in mce/amd.c.
But what do you think of severity? Will this make an impact when handling
panic severity levels? .. mce_severity_amd_smca().
I can drop this patch if it doesn't impact much.
Thanks,
Smita
On Tue, Feb 22, 2022 at 02:47:44PM -0600, Koralahalli Channabasappa, Smita wrote:
> But what do you think of severity? Will this make an impact when handling
> panic severity levels? .. mce_severity_amd_smca().
Well, look at the code: severity grading gets called when either polling
or #MC handler gets to log an MCE. Reading an MSR costs a couple of
hundred cycles. The whole MCE logging path costs maybe a couple of
*orders* of magnitude more so that MSR read is in the noise when you
have a 4GHz CPU executing 4 billion cycles per second.
Now, that's for a single MCE.
If it were more, say 10s, 100s, 1000s MCEs, then the MSR read is the
least of your problems.
But this is me conjecturing - I'm always interested in a real proof
where it shows or it does not.
I guess what I'm trying to say is, yeah, sure, speed is mostly a good
argument. But you always need to consider at what cost you'd get that
speed. And if at all. There are other important things like keeping the
code base maintainable, readable and able to accept modifications for
new features.
So there's always this question of balance that needs to be asked...
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Hi Boris,
On 2/22/22 9:36 AM, Borislav Petkov wrote:
> On Fri, Feb 11, 2022 at 04:34:42PM -0600, Smita Koralahalli wrote:
>> Include struct mce_bank member "init" in the bitfield by changing its type
>> from bool to get rid of unnecessary padding and to reduce the overall
>> struct size.
>>
>> Outputs collected before and after the change.
>>
>> $ pahole -C mce_bank arch/x86/kernel/cpu/mce/core.o
>>
>> before:
>>
>> /* size: 24, cachelines: 1, members: 5 */
>> /* bit holes: 1, sum bit holes: 62 bits */
>> /* bit_padding: 2 bits */
>> /* last cacheline: 24 bytes */
>>
>> after:
>>
>> /* size: 16, cachelines: 1, members: 5 */
>> /* last cacheline: 16 bytes */
> I don't mind cleanups like that but when you send them as part of a set
> adding new functionality, the usual rule is to put bug fixes, cleanups,
> improvements, etc to the existing code *first*, and then, ontop you add
> your new code.
>
> IOW, this patch should be first in your set.
>
> Thx.
>
Thanks for letting me know.
Will correct and move it to first in the next series.
Thanks,
Smita
On Fri, Feb 11, 2022 at 04:34:39PM -0600, Smita Koralahalli wrote:
> Move MCA_ADDR[ErrorAddr] extraction into a separate helper function. This
> will be further refactored in the next patch.
The concept of "next patch" can mean different commits in git so pls
reword your commit message to be void of the patch linearity concept.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 2/22/22 9:35 AM, Borislav Petkov wrote:
> On Fri, Feb 11, 2022 at 04:34:39PM -0600, Smita Koralahalli wrote:
>> Move MCA_ADDR[ErrorAddr] extraction into a separate helper function. This
>> will be further refactored in the next patch.
> The concept of "next patch" can mean different commits in git so pls
> reword your commit message to be void of the patch linearity concept.
>
Will correct.
Thanks,
Smita
On 2/22/22 3:15 PM, Borislav Petkov wrote:
> On Tue, Feb 22, 2022 at 02:47:44PM -0600, Koralahalli Channabasappa, Smita wrote:
>> But what do you think of severity? Will this make an impact when handling
>> panic severity levels? .. mce_severity_amd_smca().
> Well, look at the code: severity grading gets called when either polling
> or #MC handler gets to log an MCE. Reading an MSR costs a couple of
> hundred cycles. The whole MCE logging path costs maybe a couple of
> *orders* of magnitude more so that MSR read is in the noise when you
> have a 4GHz CPU executing 4 billion cycles per second.
>
> Now, that's for a single MCE.
>
> If it were more, say 10s, 100s, 1000s MCEs, then the MSR read is the
> least of your problems.
>
> But this is me conjecturing - I'm always interested in a real proof
> where it shows or it does not.
>
> I guess what I'm trying to say is, yeah, sure, speed is mostly a good
> argument. But you always need to consider at what cost you'd get that
> speed. And if at all. There are other important things like keeping the
> code base maintainable, readable and able to accept modifications for
> new features.
>
> So there's always this question of balance that needs to be asked...
>
Okay, this makes sense to me now. Thanks for the explanation. I will
drop this patch in the next series.
Thanks,
Smita