2019-04-30 20:33:44

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH v3 0/6] Handle MCA banks in a per_cpu way

From: Yazen Ghannam <[email protected]>

The focus of this patchset is define and use the MCA bank structures
and bank count per logical CPU.

With the exception of patch 4, this set applies to systems in production
today.

Patch 1:
Moves the declaration of struct mce_banks[] to the only file it's used.

Patch 2:
Splits struct mce_bank into a structure for fields common to MCA banks
on all CPUs and another structure that can be used per_cpu.

Patch 3:
Brings full circle the saga of the threshold block addresses on SMCA
systems. After taking a step back and reviewing the AMD documentation, I
think that this implimentation is the simplest and more robust way to
follow the spec.

Patch 4:
Saves and uses the MCA bank count as a per_cpu variable. This is to
support systems that have MCA bank counts that are different between
logical CPUs.

Patch 5:
Makes sure that sysfs reports the MCA_CTL value as set in hardware. This
is not something related to making things per_cpu but rather just
something I noticed while working on the other patches.

Patch 6:
Prevents sysfs access for MCA banks that are uninitialized.

Link:
https://lkml.kernel.org/r/[email protected]

Thanks,
Yazen

Yazen Ghannam (6):
x86/MCE: Make struct mce_banks[] static
x86/MCE: Handle MCA controls in a per_cpu way
x86/MCE/AMD: Don't cache block addresses on SMCA systems
x86/MCE: Make number of MCA banks per_cpu
x86/MCE: Save MCA control bits that get set in hardware
x86/MCE: Treat MCE bank as initialized if control bits set in hardware

arch/x86/kernel/cpu/mce/amd.c | 90 ++++++++++----------
arch/x86/kernel/cpu/mce/core.c | 131 +++++++++++++++++++++--------
arch/x86/kernel/cpu/mce/internal.h | 12 +--
3 files changed, 142 insertions(+), 91 deletions(-)

--
2.17.1


2019-04-30 20:33:55

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH v3 2/6] x86/MCE: Handle MCA controls in a per_cpu way

From: Yazen Ghannam <[email protected]>

Current AMD systems have unique MCA banks per logical CPU even though
the type of the banks may all align to the same bank number. Each CPU
will have control of a set of MCA banks in the hardware and these are
not shared with other CPUs.

For example, bank 0 may be the Load-Store Unit on every logical CPU, but
each bank 0 is a unique structure in the hardware. In other words, there
isn't a *single* Load-Store Unit at MCA bank 0 that all logical CPUs
share.

This idea extends even to non-core MCA banks. For example, CPU0 and CPU4
may see a Unified Memory Controller at bank 15, but each CPU is actually
seeing a unique hardware structure that is not shared with other CPUs.

Because the MCA banks are all unique hardware structures, it would be
good to control them in a more granular way. For example, if there is a
known issue with the Floating Point Unit on CPU5 and a user wishes to
disable an error type on the Floating Point Unit, then it would be good
to do this only for CPU5 rather than all CPUs.

Also, future AMD systems may have heterogeneous MCA banks. Meaning the
bank numbers may not necessarily represent the same types between CPUs.
For example, bank 20 visible to CPU0 may be a Unified Memory Controller
and bank 20 visible to CPU4 may be a Coherent Slave. So granular control
will be even more necessary should the user wish to control specific MCA
banks.

Split the device attributes from struct mce_bank leaving only the MCA
bank control fields.

Make struct mce_banks[] per_cpu in order to have more granular control
over individual MCA banks in the hardware.

Allocate the device attributes statically based on the maximum number of
MCA banks supported. The sysfs interface will use as many as needed per
CPU. Currently, this is set to mca_cfg.banks, but will be changed to a
per_cpu bank count in a future patch.

Allocate the MCA control bits dynamically. Use the maximum number of MCA
banks supported for now. This will be changed to a per_cpu bank count in
a future patch.

Redo the sysfs store/show functions to handle the per_cpu mce_banks[].

Signed-off-by: Yazen Ghannam <[email protected]>
---
Link:
https://lkml.kernel.org/r/[email protected]

v2->v3:
* Keep old member alignment in struct mce_bank.
* Change "cpu" to "CPU" in modified comment.
* Use a local array pointer when doing multiple per_cpu accesses.

v1->v2:
* Change "struct mce_bank*" to "struct mce_bank *" in definition.

arch/x86/kernel/cpu/mce/core.c | 59 ++++++++++++++++++++++++++--------
1 file changed, 45 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index ba5767dd5538..66347bdc8b08 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -64,16 +64,21 @@ static DEFINE_MUTEX(mce_sysfs_mutex);

DEFINE_PER_CPU(unsigned, mce_exception_count);

-#define ATTR_LEN 16
-/* One object for each MCE bank, shared by all CPUs */
struct mce_bank {
u64 ctl; /* subevents to enable */
bool init; /* initialise bank? */
+};
+static DEFINE_PER_CPU_READ_MOSTLY(struct mce_bank *, mce_banks_percpu);
+
+#define ATTR_LEN 16
+/* One object for each MCE bank, shared by all CPUs */
+struct mce_bank_dev {
struct device_attribute attr; /* device attribute */
char attrname[ATTR_LEN]; /* attribute name */
+ u8 bank; /* bank number */
};
+static struct mce_bank_dev mce_bank_devs[MAX_NR_BANKS];

-static struct mce_bank *mce_banks __read_mostly;
struct mce_vendor_flags mce_flags __read_mostly;

struct mca_config mca_cfg __read_mostly = {
@@ -683,6 +688,7 @@ DEFINE_PER_CPU(unsigned, mce_poll_count);
*/
bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
{
+ struct mce_bank *mce_banks = this_cpu_read(mce_banks_percpu);
bool error_seen = false;
struct mce m;
int i;
@@ -1130,6 +1136,7 @@ static void __mc_scan_banks(struct mce *m, struct mce *final,
unsigned long *toclear, unsigned long *valid_banks,
int no_way_out, int *worst)
{
+ struct mce_bank *mce_banks = this_cpu_read(mce_banks_percpu);
struct mca_config *cfg = &mca_cfg;
int severity, i;

@@ -1473,6 +1480,7 @@ EXPORT_SYMBOL_GPL(mce_notify_irq);

static int __mcheck_cpu_mce_banks_init(void)
{
+ struct mce_bank *mce_banks;
int i;

mce_banks = kcalloc(MAX_NR_BANKS, sizeof(struct mce_bank), GFP_KERNEL);
@@ -1485,6 +1493,8 @@ static int __mcheck_cpu_mce_banks_init(void)
b->ctl = -1ULL;
b->init = 1;
}
+
+ per_cpu(mce_banks_percpu, smp_processor_id()) = mce_banks;
return 0;
}

@@ -1504,7 +1514,7 @@ static int __mcheck_cpu_cap_init(void)

mca_cfg.banks = max(mca_cfg.banks, b);

- if (!mce_banks) {
+ if (!this_cpu_read(mce_banks_percpu)) {
int err = __mcheck_cpu_mce_banks_init();
if (err)
return err;
@@ -1544,6 +1554,7 @@ static void __mcheck_cpu_init_generic(void)

static void __mcheck_cpu_init_clear_banks(void)
{
+ struct mce_bank *mce_banks = this_cpu_read(mce_banks_percpu);
int i;

for (i = 0; i < mca_cfg.banks; i++) {
@@ -1587,6 +1598,7 @@ static void quirk_sandybridge_ifu(int bank, struct mce *m, struct pt_regs *regs)
/* Add per CPU specific workarounds here */
static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c)
{
+ struct mce_bank *mce_banks = this_cpu_read(mce_banks_percpu);
struct mca_config *cfg = &mca_cfg;

if (c->x86_vendor == X86_VENDOR_UNKNOWN) {
@@ -1957,6 +1969,7 @@ int __init mcheck_init(void)
*/
static void mce_disable_error_reporting(void)
{
+ struct mce_bank *mce_banks = this_cpu_read(mce_banks_percpu);
int i;

for (i = 0; i < mca_cfg.banks; i++) {
@@ -2059,26 +2072,41 @@ static struct bus_type mce_subsys = {

DEFINE_PER_CPU(struct device *, mce_device);

-static inline struct mce_bank *attr_to_bank(struct device_attribute *attr)
+static inline struct mce_bank_dev *attr_to_bank(struct device_attribute *attr)
{
- return container_of(attr, struct mce_bank, attr);
+ return container_of(attr, struct mce_bank_dev, attr);
}

static ssize_t show_bank(struct device *s, struct device_attribute *attr,
char *buf)
{
- return sprintf(buf, "%llx\n", attr_to_bank(attr)->ctl);
+ u8 bank = attr_to_bank(attr)->bank;
+ struct mce_bank *b;
+
+ if (bank >= mca_cfg.banks)
+ return -EINVAL;
+
+ b = &per_cpu(mce_banks_percpu, s->id)[bank];
+
+ return sprintf(buf, "%llx\n", b->ctl);
}

static ssize_t set_bank(struct device *s, struct device_attribute *attr,
const char *buf, size_t size)
{
+ u8 bank = attr_to_bank(attr)->bank;
+ struct mce_bank *b;
u64 new;

if (kstrtou64(buf, 0, &new) < 0)
return -EINVAL;

- attr_to_bank(attr)->ctl = new;
+ if (bank >= mca_cfg.banks)
+ return -EINVAL;
+
+ b = &per_cpu(mce_banks_percpu, s->id)[bank];
+
+ b->ctl = new;
mce_restart();

return size;
@@ -2193,7 +2221,7 @@ static void mce_device_release(struct device *dev)
kfree(dev);
}

-/* Per cpu device init. All of the cpus still share the same ctrl bank: */
+/* Per CPU device init. All of the CPUs still share the same bank device: */
static int mce_device_create(unsigned int cpu)
{
struct device *dev;
@@ -2226,7 +2254,7 @@ static int mce_device_create(unsigned int cpu)
goto error;
}
for (j = 0; j < mca_cfg.banks; j++) {
- err = device_create_file(dev, &mce_banks[j].attr);
+ err = device_create_file(dev, &mce_bank_devs[j].attr);
if (err)
goto error2;
}
@@ -2236,7 +2264,7 @@ static int mce_device_create(unsigned int cpu)
return 0;
error2:
while (--j >= 0)
- device_remove_file(dev, &mce_banks[j].attr);
+ device_remove_file(dev, &mce_bank_devs[j].attr);
error:
while (--i >= 0)
device_remove_file(dev, mce_device_attrs[i]);
@@ -2258,7 +2286,7 @@ static void mce_device_remove(unsigned int cpu)
device_remove_file(dev, mce_device_attrs[i]);

for (i = 0; i < mca_cfg.banks; i++)
- device_remove_file(dev, &mce_banks[i].attr);
+ device_remove_file(dev, &mce_bank_devs[i].attr);

device_unregister(dev);
cpumask_clear_cpu(cpu, mce_device_initialized);
@@ -2279,6 +2307,7 @@ static void mce_disable_cpu(void)

static void mce_reenable_cpu(void)
{
+ struct mce_bank *mce_banks = this_cpu_read(mce_banks_percpu);
int i;

if (!mce_available(raw_cpu_ptr(&cpu_info)))
@@ -2336,10 +2365,12 @@ static __init void mce_init_banks(void)
{
int i;

- for (i = 0; i < mca_cfg.banks; i++) {
- struct mce_bank *b = &mce_banks[i];
+ for (i = 0; i < MAX_NR_BANKS; i++) {
+ struct mce_bank_dev *b = &mce_bank_devs[i];
struct device_attribute *a = &b->attr;

+ b->bank = i;
+
sysfs_attr_init(&a->attr);
a->attr.name = b->attrname;
snprintf(b->attrname, ATTR_LEN, "bank%d", i);
--
2.17.1

2019-04-30 20:33:57

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH v3 6/6] x86/MCE: Treat MCE bank as initialized if control bits set in hardware

From: Yazen Ghannam <[email protected]>

The OS is expected to write all bits to MCA_CTL for each bank. However,
some banks may be unused in which case the registers for such banks are
Read-as-Zero/Writes-Ignored. Also, the OS may not write any control bits
because of quirks, etc.

A bank can be considered uninitialized if the MCA_CTL register returns
zero. This is because either the OS did not write anything or because
the hardware is enforcing RAZ/WI for the bank.

Set a bank's init value based on if the control bits are set or not in
hardware.

Return an error code in the sysfs interface for uninitialized banks.

Signed-off-by: Yazen Ghannam <[email protected]>
---
Link:
https://lkml.kernel.org/r/[email protected]

v2->v3:
* No change.

v1->v2:
* New in v2.
* Based on discussion from v1 patch 2.

arch/x86/kernel/cpu/mce/core.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 551366c155ef..e59947e10ee0 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1574,6 +1574,9 @@ static void __mcheck_cpu_init_clear_banks(void)

/* Save bits set in hardware. */
rdmsrl(msr_ops.ctl(i), b->ctl);
+
+ /* Bank is initialized if bits are set in hardware. */
+ b->init = !!b->ctl;
}
}

@@ -2098,6 +2101,9 @@ static ssize_t show_bank(struct device *s, struct device_attribute *attr,

b = &per_cpu(mce_banks_percpu, s->id)[bank];

+ if (!b->init)
+ return -ENODEV;
+
return sprintf(buf, "%llx\n", b->ctl);
}

@@ -2116,6 +2122,9 @@ static ssize_t set_bank(struct device *s, struct device_attribute *attr,

b = &per_cpu(mce_banks_percpu, s->id)[bank];

+ if (!b->init)
+ return -ENODEV;
+
b->ctl = new;
mce_restart();

--
2.17.1

2019-04-30 20:34:06

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH v3 4/6] x86/MCE: Make number of MCA banks per_cpu

From: Yazen Ghannam <[email protected]>

The number of MCA banks is provided per logical CPU. Historically, this
number has been the same across all CPUs, but this is not an
architectural guarantee. Future AMD systems may have MCA bank counts
that vary between logical CPUs in a system.

This issue was partially addressed in

006c077041dc ("x86/mce: Handle varying MCA bank counts")

by allocating structures using the maximum number of MCA banks and by
saving the maximum MCA bank count in a system as the global count. This
means that some extra structures are allocated. Also, this means that
CPUs will spend more time in the #MC and other handlers checking extra
MCA banks.

Define the number of MCA banks as a per_cpu variable. Replace all uses
of mca_cfg.banks with this.

Also, use the per_cpu bank count when allocating per_cpu structures.

Signed-off-by: Yazen Ghannam <[email protected]>
---
Link:
https://lkml.kernel.org/r/[email protected]

v2->v3:
* Drop pr_debug() message.
* Change commit reference format.

v1->v2:
* Drop export of new variable and leave injector code as-is.
* Add "mce_" prefix to new "num_banks" variable.

arch/x86/kernel/cpu/mce/amd.c | 17 ++++++-----
arch/x86/kernel/cpu/mce/core.c | 47 +++++++++++++++++-------------
arch/x86/kernel/cpu/mce/internal.h | 2 +-
3 files changed, 36 insertions(+), 30 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index d4d6e4b7f9dc..9f729d50676c 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -495,7 +495,7 @@ static u32 get_block_address(u32 current_addr, u32 low, u32 high,
{
u32 addr = 0, offset = 0;

- if ((bank >= mca_cfg.banks) || (block >= NR_BLOCKS))
+ if ((bank >= per_cpu(mce_num_banks, cpu)) || (block >= NR_BLOCKS))
return addr;

if (mce_flags.smca)
@@ -631,7 +631,8 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
unsigned int bank, block, cpu = smp_processor_id();
int offset = -1;

- for (bank = 0; bank < mca_cfg.banks; ++bank) {
+
+ for (bank = 0; bank < this_cpu_read(mce_num_banks); ++bank) {
if (mce_flags.smca)
smca_configure(bank, cpu);

@@ -976,7 +977,7 @@ static void amd_deferred_error_interrupt(void)
{
unsigned int bank;

- for (bank = 0; bank < mca_cfg.banks; ++bank)
+ for (bank = 0; bank < this_cpu_read(mce_num_banks); ++bank)
log_error_deferred(bank);
}

@@ -1017,7 +1018,7 @@ static void amd_threshold_interrupt(void)
struct threshold_block *first_block = NULL, *block = NULL, *tmp = NULL;
unsigned int bank, cpu = smp_processor_id();

- for (bank = 0; bank < mca_cfg.banks; ++bank) {
+ for (bank = 0; bank < this_cpu_read(mce_num_banks); ++bank) {
if (!(per_cpu(bank_map, cpu) & (1 << bank)))
continue;

@@ -1204,7 +1205,7 @@ static int allocate_threshold_blocks(unsigned int cpu, unsigned int bank,
u32 low, high;
int err;

- if ((bank >= mca_cfg.banks) || (block >= NR_BLOCKS))
+ if ((bank >= per_cpu(mce_num_banks, cpu)) || (block >= NR_BLOCKS))
return 0;

if (rdmsr_safe_on_cpu(cpu, address, &low, &high))
@@ -1438,7 +1439,7 @@ int mce_threshold_remove_device(unsigned int cpu)
{
unsigned int bank;

- for (bank = 0; bank < mca_cfg.banks; ++bank) {
+ for (bank = 0; bank < per_cpu(mce_num_banks, cpu); ++bank) {
if (!(per_cpu(bank_map, cpu) & (1 << bank)))
continue;
threshold_remove_bank(cpu, bank);
@@ -1459,14 +1460,14 @@ int mce_threshold_create_device(unsigned int cpu)
if (bp)
return 0;

- bp = kcalloc(mca_cfg.banks, sizeof(struct threshold_bank *),
+ bp = kcalloc(per_cpu(mce_num_banks, cpu), sizeof(struct threshold_bank *),
GFP_KERNEL);
if (!bp)
return -ENOMEM;

per_cpu(threshold_banks, cpu) = bp;

- for (bank = 0; bank < mca_cfg.banks; ++bank) {
+ for (bank = 0; bank < per_cpu(mce_num_banks, cpu); ++bank) {
if (!(per_cpu(bank_map, cpu) & (1 << bank)))
continue;
err = threshold_create_bank(cpu, bank);
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 66347bdc8b08..986de830f26e 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -64,6 +64,8 @@ static DEFINE_MUTEX(mce_sysfs_mutex);

DEFINE_PER_CPU(unsigned, mce_exception_count);

+DEFINE_PER_CPU_READ_MOSTLY(u8, mce_num_banks);
+
struct mce_bank {
u64 ctl; /* subevents to enable */
bool init; /* initialise bank? */
@@ -700,7 +702,7 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
if (flags & MCP_TIMESTAMP)
m.tsc = rdtsc();

- for (i = 0; i < mca_cfg.banks; i++) {
+ for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
if (!mce_banks[i].ctl || !test_bit(i, *b))
continue;

@@ -802,7 +804,7 @@ static int mce_no_way_out(struct mce *m, char **msg, unsigned long *validp,
char *tmp;
int i;

- for (i = 0; i < mca_cfg.banks; i++) {
+ for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
m->status = mce_rdmsrl(msr_ops.status(i));
if (!(m->status & MCI_STATUS_VAL))
continue;
@@ -1082,7 +1084,7 @@ static void mce_clear_state(unsigned long *toclear)
{
int i;

- for (i = 0; i < mca_cfg.banks; i++) {
+ for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
if (test_bit(i, toclear))
mce_wrmsrl(msr_ops.status(i), 0);
}
@@ -1140,7 +1142,7 @@ static void __mc_scan_banks(struct mce *m, struct mce *final,
struct mca_config *cfg = &mca_cfg;
int severity, i;

- for (i = 0; i < cfg->banks; i++) {
+ for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
__clear_bit(i, toclear);
if (!test_bit(i, valid_banks))
continue;
@@ -1480,14 +1482,15 @@ EXPORT_SYMBOL_GPL(mce_notify_irq);

static int __mcheck_cpu_mce_banks_init(void)
{
+ u8 n_banks = this_cpu_read(mce_num_banks);
struct mce_bank *mce_banks;
int i;

- mce_banks = kcalloc(MAX_NR_BANKS, sizeof(struct mce_bank), GFP_KERNEL);
+ mce_banks = kcalloc(n_banks, sizeof(struct mce_bank), GFP_KERNEL);
if (!mce_banks)
return -ENOMEM;

- for (i = 0; i < MAX_NR_BANKS; i++) {
+ for (i = 0; i < n_banks; i++) {
struct mce_bank *b = &mce_banks[i];

b->ctl = -1ULL;
@@ -1509,10 +1512,14 @@ static int __mcheck_cpu_cap_init(void)
rdmsrl(MSR_IA32_MCG_CAP, cap);

b = cap & MCG_BANKCNT_MASK;
- if (WARN_ON_ONCE(b > MAX_NR_BANKS))
+
+ if (b > MAX_NR_BANKS) {
+ pr_warn("CPU%d: Using only %u machine check banks out of %u\n",
+ smp_processor_id(), MAX_NR_BANKS, b);
b = MAX_NR_BANKS;
+ }

- mca_cfg.banks = max(mca_cfg.banks, b);
+ this_cpu_write(mce_num_banks, b);

if (!this_cpu_read(mce_banks_percpu)) {
int err = __mcheck_cpu_mce_banks_init();
@@ -1557,7 +1564,7 @@ static void __mcheck_cpu_init_clear_banks(void)
struct mce_bank *mce_banks = this_cpu_read(mce_banks_percpu);
int i;

- for (i = 0; i < mca_cfg.banks; i++) {
+ for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
struct mce_bank *b = &mce_banks[i];

if (!b->init)
@@ -1608,7 +1615,7 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c)

/* This should be disabled by the BIOS, but isn't always */
if (c->x86_vendor == X86_VENDOR_AMD) {
- if (c->x86 == 15 && cfg->banks > 4) {
+ if (c->x86 == 15 && this_cpu_read(mce_num_banks) > 4) {
/*
* disable GART TBL walk error reporting, which
* trips off incorrectly with the IOMMU & 3ware
@@ -1627,7 +1634,7 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c)
* Various K7s with broken bank 0 around. Always disable
* by default.
*/
- if (c->x86 == 6 && cfg->banks > 0)
+ if (c->x86 == 6 && this_cpu_read(mce_num_banks) > 0)
mce_banks[0].ctl = 0;

/*
@@ -1649,7 +1656,7 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c)
* valid event later, merely don't write CTL0.
*/

- if (c->x86 == 6 && c->x86_model < 0x1A && cfg->banks > 0)
+ if (c->x86 == 6 && c->x86_model < 0x1A && this_cpu_read(mce_num_banks) > 0)
mce_banks[0].init = 0;

/*
@@ -1883,7 +1890,7 @@ static void __mce_disable_bank(void *arg)

void mce_disable_bank(int bank)
{
- if (bank >= mca_cfg.banks) {
+ if (bank >= this_cpu_read(mce_num_banks)) {
pr_warn(FW_BUG
"Ignoring request to disable invalid MCA bank %d.\n",
bank);
@@ -1972,7 +1979,7 @@ static void mce_disable_error_reporting(void)
struct mce_bank *mce_banks = this_cpu_read(mce_banks_percpu);
int i;

- for (i = 0; i < mca_cfg.banks; i++) {
+ for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
struct mce_bank *b = &mce_banks[i];

if (b->init)
@@ -2083,7 +2090,7 @@ static ssize_t show_bank(struct device *s, struct device_attribute *attr,
u8 bank = attr_to_bank(attr)->bank;
struct mce_bank *b;

- if (bank >= mca_cfg.banks)
+ if (bank >= per_cpu(mce_num_banks, s->id))
return -EINVAL;

b = &per_cpu(mce_banks_percpu, s->id)[bank];
@@ -2101,7 +2108,7 @@ static ssize_t set_bank(struct device *s, struct device_attribute *attr,
if (kstrtou64(buf, 0, &new) < 0)
return -EINVAL;

- if (bank >= mca_cfg.banks)
+ if (bank >= per_cpu(mce_num_banks, s->id))
return -EINVAL;

b = &per_cpu(mce_banks_percpu, s->id)[bank];
@@ -2253,7 +2260,7 @@ static int mce_device_create(unsigned int cpu)
if (err)
goto error;
}
- for (j = 0; j < mca_cfg.banks; j++) {
+ for (j = 0; j < per_cpu(mce_num_banks, cpu); j++) {
err = device_create_file(dev, &mce_bank_devs[j].attr);
if (err)
goto error2;
@@ -2285,7 +2292,7 @@ static void mce_device_remove(unsigned int cpu)
for (i = 0; mce_device_attrs[i]; i++)
device_remove_file(dev, mce_device_attrs[i]);

- for (i = 0; i < mca_cfg.banks; i++)
+ for (i = 0; i < per_cpu(mce_num_banks, cpu); i++)
device_remove_file(dev, &mce_bank_devs[i].attr);

device_unregister(dev);
@@ -2315,7 +2322,7 @@ static void mce_reenable_cpu(void)

if (!cpuhp_tasks_frozen)
cmci_reenable();
- for (i = 0; i < mca_cfg.banks; i++) {
+ for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
struct mce_bank *b = &mce_banks[i];

if (b->init)
@@ -2503,8 +2510,6 @@ EXPORT_SYMBOL_GPL(mcsafe_key);

static int __init mcheck_late_init(void)
{
- pr_info("Using %d MCE banks\n", mca_cfg.banks);
-
if (mca_cfg.recovery)
static_branch_inc(&mcsafe_key);

diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h
index 35b3e5c02c1c..5215dcd01614 100644
--- a/arch/x86/kernel/cpu/mce/internal.h
+++ b/arch/x86/kernel/cpu/mce/internal.h
@@ -118,7 +118,6 @@ struct mca_config {
bios_cmci_threshold : 1,
__reserved : 59;

- u8 banks;
s8 bootlog;
int tolerant;
int monarch_timeout;
@@ -127,6 +126,7 @@ struct mca_config {
};

extern struct mca_config mca_cfg;
+DECLARE_PER_CPU_READ_MOSTLY(u8, mce_num_banks);

struct mce_vendor_flags {
/*
--
2.17.1

2019-04-30 20:34:36

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH v3 3/6] x86/MCE/AMD: Don't cache block addresses on SMCA systems

From: Yazen Ghannam <[email protected]>

On legacy systems, the addresses of the MCA_MISC* registers need to be
recursively discovered based on a Block Pointer field in the registers.

On Scalable MCA systems, the register space is fixed, and particular
addresses can be derived by regular offsets for bank and register type.
This fixed address space includes the MCA_MISC* registers.

MCA_MISC0 is always available for each MCA bank. MCA_MISC1 through
MCA_MISC4 are considered available if MCA_MISC0[BlkPtr]=1.

Cache the value of MCA_MISC0[BlkPtr] for each bank and per CPU. This
needs to be done only during init. The values should be saved per CPU
to accommodate heterogeneous SMCA systems.

Redo smca_get_block_address() to directly return the block addresses.
Use the cached Block Pointer value to decide if the MCA_MISC1-4
addresses should be returned.

Signed-off-by: Yazen Ghannam <[email protected]>
---
Link:
https://lkml.kernel.org/r/[email protected]

v2->v3:
* Change name of new map variable to "smca_misc_banks_map".
* Use "BIT()" where appropriate.

v1->v2:
* No change.

arch/x86/kernel/cpu/mce/amd.c | 73 ++++++++++++++++++-----------------
1 file changed, 37 insertions(+), 36 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index d904aafe6409..d4d6e4b7f9dc 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -101,11 +101,6 @@ static struct smca_bank_name smca_names[] = {
[SMCA_PCIE] = { "pcie", "PCI Express Unit" },
};

-static u32 smca_bank_addrs[MAX_NR_BANKS][NR_BLOCKS] __ro_after_init =
-{
- [0 ... MAX_NR_BANKS - 1] = { [0 ... NR_BLOCKS - 1] = -1 }
-};
-
static const char *smca_get_name(enum smca_bank_types t)
{
if (t >= N_SMCA_BANK_TYPES)
@@ -199,6 +194,9 @@ static char buf_mcatype[MAX_MCATYPE_NAME_LEN];
static DEFINE_PER_CPU(struct threshold_bank **, threshold_banks);
static DEFINE_PER_CPU(unsigned int, bank_map); /* see which banks are on */

+/* Map of banks that have more than MCA_MISC0 available. */
+static DEFINE_PER_CPU(u32, smca_misc_banks_map);
+
static void amd_threshold_interrupt(void);
static void amd_deferred_error_interrupt(void);

@@ -208,6 +206,28 @@ static void default_deferred_error_interrupt(void)
}
void (*deferred_error_int_vector)(void) = default_deferred_error_interrupt;

+static void smca_set_misc_banks_map(unsigned int bank, unsigned int cpu)
+{
+ u32 low, high;
+
+ /*
+ * For SMCA enabled processors, BLKPTR field of the first MISC register
+ * (MCx_MISC0) indicates presence of additional MISC regs set (MISC1-4).
+ */
+ if (rdmsr_safe(MSR_AMD64_SMCA_MCx_CONFIG(bank), &low, &high))
+ return;
+
+ if (!(low & MCI_CONFIG_MCAX))
+ return;
+
+ if (rdmsr_safe(MSR_AMD64_SMCA_MCx_MISC(bank), &low, &high))
+ return;
+
+ if (low & MASK_BLKPTR_LO)
+ per_cpu(smca_misc_banks_map, cpu) |= BIT(bank);
+
+}
+
static void smca_configure(unsigned int bank, unsigned int cpu)
{
unsigned int i, hwid_mcatype;
@@ -245,6 +265,8 @@ static void smca_configure(unsigned int bank, unsigned int cpu)
wrmsr(smca_config, low, high);
}

+ smca_set_misc_banks_map(bank, cpu);
+
/* Return early if this bank was already initialized. */
if (smca_banks[bank].hwid)
return;
@@ -455,42 +477,21 @@ static void deferred_error_interrupt_enable(struct cpuinfo_x86 *c)
wrmsr(MSR_CU_DEF_ERR, low, high);
}

-static u32 smca_get_block_address(unsigned int bank, unsigned int block)
+static u32 smca_get_block_address(unsigned int bank, unsigned int block,
+ unsigned int cpu)
{
- u32 low, high;
- u32 addr = 0;
-
- if (smca_get_bank_type(bank) == SMCA_RESERVED)
- return addr;
-
if (!block)
return MSR_AMD64_SMCA_MCx_MISC(bank);

- /* Check our cache first: */
- if (smca_bank_addrs[bank][block] != -1)
- return smca_bank_addrs[bank][block];
-
- /*
- * For SMCA enabled processors, BLKPTR field of the first MISC register
- * (MCx_MISC0) indicates presence of additional MISC regs set (MISC1-4).
- */
- if (rdmsr_safe(MSR_AMD64_SMCA_MCx_CONFIG(bank), &low, &high))
- goto out;
-
- if (!(low & MCI_CONFIG_MCAX))
- goto out;
-
- if (!rdmsr_safe(MSR_AMD64_SMCA_MCx_MISC(bank), &low, &high) &&
- (low & MASK_BLKPTR_LO))
- addr = MSR_AMD64_SMCA_MCx_MISCy(bank, block - 1);
+ if (!(per_cpu(smca_misc_banks_map, cpu) & BIT(bank)))
+ return 0;

-out:
- smca_bank_addrs[bank][block] = addr;
- return addr;
+ return MSR_AMD64_SMCA_MCx_MISCy(bank, block - 1);
}

static u32 get_block_address(u32 current_addr, u32 low, u32 high,
- unsigned int bank, unsigned int block)
+ unsigned int bank, unsigned int block,
+ unsigned int cpu)
{
u32 addr = 0, offset = 0;

@@ -498,7 +499,7 @@ static u32 get_block_address(u32 current_addr, u32 low, u32 high,
return addr;

if (mce_flags.smca)
- return smca_get_block_address(bank, block);
+ return smca_get_block_address(bank, block, cpu);

/* Fall back to method we used for older processors: */
switch (block) {
@@ -637,7 +638,7 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
disable_err_thresholding(c, bank);

for (block = 0; block < NR_BLOCKS; ++block) {
- address = get_block_address(address, low, high, bank, block);
+ address = get_block_address(address, low, high, bank, block, cpu);
if (!address)
break;

@@ -1254,7 +1255,7 @@ static int allocate_threshold_blocks(unsigned int cpu, unsigned int bank,
if (err)
goto out_free;
recurse:
- address = get_block_address(address, low, high, bank, ++block);
+ address = get_block_address(address, low, high, bank, ++block, cpu);
if (!address)
return 0;

--
2.17.1

2019-04-30 20:35:02

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH v3 1/6] x86/MCE: Make struct mce_banks[] static

From: Yazen Ghannam <[email protected]>

The struct mce_banks[] array is only used in mce/core.c so move the
definition of struct mce_bank to mce/core.c and make the array static.

Also, change the "init" field to bool type.

Signed-off-by: Yazen Ghannam <[email protected]>
---
Link:
https://lkml.kernel.org/r/[email protected]

v2->v3:
* No changes

v1->v2:
* No changes

arch/x86/kernel/cpu/mce/core.c | 11 ++++++++++-
arch/x86/kernel/cpu/mce/internal.h | 10 ----------
2 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 5112a50e6486..ba5767dd5538 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -64,7 +64,16 @@ static DEFINE_MUTEX(mce_sysfs_mutex);

DEFINE_PER_CPU(unsigned, mce_exception_count);

-struct mce_bank *mce_banks __read_mostly;
+#define ATTR_LEN 16
+/* One object for each MCE bank, shared by all CPUs */
+struct mce_bank {
+ u64 ctl; /* subevents to enable */
+ bool init; /* initialise bank? */
+ struct device_attribute attr; /* device attribute */
+ char attrname[ATTR_LEN]; /* attribute name */
+};
+
+static struct mce_bank *mce_banks __read_mostly;
struct mce_vendor_flags mce_flags __read_mostly;

struct mca_config mca_cfg __read_mostly = {
diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h
index a34b55baa7aa..35b3e5c02c1c 100644
--- a/arch/x86/kernel/cpu/mce/internal.h
+++ b/arch/x86/kernel/cpu/mce/internal.h
@@ -22,17 +22,8 @@ enum severity_level {

extern struct blocking_notifier_head x86_mce_decoder_chain;

-#define ATTR_LEN 16
#define INITIAL_CHECK_INTERVAL 5 * 60 /* 5 minutes */

-/* One object for each MCE bank, shared by all CPUs */
-struct mce_bank {
- u64 ctl; /* subevents to enable */
- unsigned char init; /* initialise bank? */
- struct device_attribute attr; /* device attribute */
- char attrname[ATTR_LEN]; /* attribute name */
-};
-
struct mce_evt_llist {
struct llist_node llnode;
struct mce mce;
@@ -47,7 +38,6 @@ struct llist_node *mce_gen_pool_prepare_records(void);
extern int (*mce_severity)(struct mce *a, int tolerant, char **msg, bool is_excp);
struct dentry *mce_get_debugfs_dir(void);

-extern struct mce_bank *mce_banks;
extern mce_banks_t mce_banks_ce_disabled;

#ifdef CONFIG_X86_MCE_INTEL
--
2.17.1

2019-04-30 20:35:59

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware

From: Yazen Ghannam <[email protected]>

The OS is expected to write all bits in MCA_CTL. However, only
implemented bits get set in the hardware.

Read back MCA_CTL so that the value in the hardware is saved and
reported through sysfs.

Signed-off-by: Yazen Ghannam <[email protected]>
---
Link:
https://lkml.kernel.org/r/[email protected]

v2->v3:
* No change.

v1->v2:
* No change.

arch/x86/kernel/cpu/mce/core.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 986de830f26e..551366c155ef 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1567,10 +1567,13 @@ static void __mcheck_cpu_init_clear_banks(void)
for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
struct mce_bank *b = &mce_banks[i];

- if (!b->init)
- continue;
- wrmsrl(msr_ops.ctl(i), b->ctl);
- wrmsrl(msr_ops.status(i), 0);
+ if (b->init) {
+ wrmsrl(msr_ops.ctl(i), b->ctl);
+ wrmsrl(msr_ops.status(i), 0);
+ }
+
+ /* Save bits set in hardware. */
+ rdmsrl(msr_ops.ctl(i), b->ctl);
}
}

@@ -2325,8 +2328,10 @@ static void mce_reenable_cpu(void)
for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
struct mce_bank *b = &mce_banks[i];

- if (b->init)
+ if (b->init) {
wrmsrl(msr_ops.ctl(i), b->ctl);
+ rdmsrl(msr_ops.ctl(i), b->ctl);
+ }
}
}

--
2.17.1

2019-05-16 16:15:55

by Yazen Ghannam

[permalink] [raw]
Subject: RE: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware

> -----Original Message-----
> From: Luck, Tony <[email protected]>
> Sent: Thursday, May 16, 2019 10:52 AM
> To: Ghannam, Yazen <[email protected]>
> Cc: [email protected]; [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware
>
>
> On Tue, Apr 30, 2019 at 08:32:20PM +0000, Ghannam, Yazen wrote:
> > diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> > index 986de830f26e..551366c155ef 100644
> > --- a/arch/x86/kernel/cpu/mce/core.c
> > +++ b/arch/x86/kernel/cpu/mce/core.c
> > @@ -1567,10 +1567,13 @@ static void __mcheck_cpu_init_clear_banks(void)
> > for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
> > struct mce_bank *b = &mce_banks[i];
> >
> > - if (!b->init)
> > - continue;
> > - wrmsrl(msr_ops.ctl(i), b->ctl);
> > - wrmsrl(msr_ops.status(i), 0);
> > + if (b->init) {
> > + wrmsrl(msr_ops.ctl(i), b->ctl);
> > + wrmsrl(msr_ops.status(i), 0);
> > + }
> > +
> > + /* Save bits set in hardware. */
> > + rdmsrl(msr_ops.ctl(i), b->ctl);
> > }
> > }
>
> This looks like it will be a problem for Intel CPUs. If
> we take a CPU offline, and then bring it back again, we
> ues "b->ctl" to reinitialize the register in mce_reenable_cpu().
>
> But Intel SDM says at the end of section "15.3.2.1 IA32_MCi_CTL_MSRs"
>
> "P6 family processors only allow the writing of all 1s or all
> 0s to the IA32_MCi_CTL MSR."
>

I can put a vendor check on the read. Is that sufficient?

Thanks,
Yazen

2019-05-16 17:23:21

by Luck, Tony

[permalink] [raw]
Subject: Re: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware

On Tue, Apr 30, 2019 at 08:32:20PM +0000, Ghannam, Yazen wrote:
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 986de830f26e..551366c155ef 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -1567,10 +1567,13 @@ static void __mcheck_cpu_init_clear_banks(void)
> for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
> struct mce_bank *b = &mce_banks[i];
>
> - if (!b->init)
> - continue;
> - wrmsrl(msr_ops.ctl(i), b->ctl);
> - wrmsrl(msr_ops.status(i), 0);
> + if (b->init) {
> + wrmsrl(msr_ops.ctl(i), b->ctl);
> + wrmsrl(msr_ops.status(i), 0);
> + }
> +
> + /* Save bits set in hardware. */
> + rdmsrl(msr_ops.ctl(i), b->ctl);
> }
> }

This looks like it will be a problem for Intel CPUs. If
we take a CPU offline, and then bring it back again, we
ues "b->ctl" to reinitialize the register in mce_reenable_cpu().

But Intel SDM says at the end of section "15.3.2.1 IA32_MCi_CTL_MSRs"

"P6 family processors only allow the writing of all 1s or all
0s to the IA32_MCi_CTL MSR."

-Tony

2019-05-16 18:23:13

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware

On Thu, May 16, 2019 at 04:14:14PM +0000, Ghannam, Yazen wrote:
> I can put a vendor check on the read. Is that sufficient?

Or we can drop this patch. Remind me again pls why do we need it?

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2019-05-16 18:31:05

by Yazen Ghannam

[permalink] [raw]
Subject: RE: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware

> -----Original Message-----
> From: [email protected] <[email protected]> On Behalf Of Borislav Petkov
> Sent: Thursday, May 16, 2019 11:57 AM
> To: Ghannam, Yazen <[email protected]>
> Cc: Luck, Tony <[email protected]>; [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware
>
>
> On Thu, May 16, 2019 at 04:14:14PM +0000, Ghannam, Yazen wrote:
> > I can put a vendor check on the read. Is that sufficient?
>
> Or we can drop this patch. Remind me again pls why do we need it?
>

So that the sysfs files show the control values that are set in the hardware. It seemed like this would be more helpful than showing all 0xF's.

But I'm okay with dropping this patch. Patch 6 in this set depends on this, so it'll need to be dropped also.

Should I send out another version of this set?

Thanks,
Yazen

2019-05-16 19:49:38

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware

On Thu, May 16, 2019 at 05:09:11PM +0000, Ghannam, Yazen wrote:
> So that the sysfs files show the control values that are set in the
> hardware. It seemed like this would be more helpful than showing all
> 0xF's.

Yeah, but it has been like that since forever and it hasn't bugged
anybody. Probably because anybody doesn't even look at those files. As
Tony says:

"RAS is a lonely subsystem ... even EDAC gets more love."

:-)))

And adding yet another vendor check for this seemed just not worth it.

> Should I send out another version of this set?

I simply zapped 5/6. I still think your 6/6 makes sense though.

---
From: Yazen Ghannam <[email protected]>
Date: Tue, 30 Apr 2019 20:32:21 +0000
Subject: [PATCH] x86/MCE: Determine MCA banks' init state properly

The OS is expected to write all bits to MCA_CTL for each bank,
thus enabling error reporting in all banks. However, some banks
may be unused in which case the registers for such banks are
Read-as-Zero/Writes-Ignored. Also, the OS may avoid setting some control
bits because of quirks, etc.

A bank can be considered uninitialized if the MCA_CTL register returns
zero. This is because either the OS did not write anything or because
the hardware is enforcing RAZ/WI for the bank.

Set a bank's init value based on if the control bits are set or not in
hardware. Return an error code in the sysfs interface for uninitialized
banks.

[ bp: Massage a bit. ]

Signed-off-by: Yazen Ghannam <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "[email protected]" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: "[email protected]" <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
arch/x86/kernel/cpu/mce/core.c | 17 +++++++++++++----
1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 5bcecadcf4d9..c049689f3d73 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1567,10 +1567,13 @@ static void __mcheck_cpu_init_clear_banks(void)
for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
struct mce_bank *b = &mce_banks[i];

- if (!b->init)
- continue;
- wrmsrl(msr_ops.ctl(i), b->ctl);
- wrmsrl(msr_ops.status(i), 0);
+ if (b->init) {
+ wrmsrl(msr_ops.ctl(i), b->ctl);
+ wrmsrl(msr_ops.status(i), 0);
+ }
+
+ /* Bank is initialized if bits are set in hardware. */
+ b->init = !!b->ctl;
}
}

@@ -2095,6 +2098,9 @@ static ssize_t show_bank(struct device *s, struct device_attribute *attr,

b = &per_cpu(mce_banks_array, s->id)[bank];

+ if (!b->init)
+ return -ENODEV;
+
return sprintf(buf, "%llx\n", b->ctl);
}

@@ -2113,6 +2119,9 @@ static ssize_t set_bank(struct device *s, struct device_attribute *attr,

b = &per_cpu(mce_banks_array, s->id)[bank];

+ if (!b->init)
+ return -ENODEV;
+
b->ctl = new;
mce_restart();

--
2.21.0

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2019-05-16 22:09:30

by Luck, Tony

[permalink] [raw]
Subject: Re: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware

On Thu, May 16, 2019 at 10:34:56PM +0200, Borislav Petkov wrote:
> On Thu, May 16, 2019 at 08:20:58PM +0000, Ghannam, Yazen wrote:
> > We don't actually know if there are bits set in hardware until we read
> > it back. So I don't think this is adding anything new.
>
> Bah, of course. We need to read it first (pasting the whole function).
> Now, __mcheck_cpu_init_clear_banks() gets called when we change
> configuration too, in mce_cpu_restart() and if we do it this way, we'll
> be rereading MCi_CTL each time but I don't see anything wrong with that.

Intel doesn't "set any bits in hardware" ... so I think you'll just
get a 0x0 and disable everything.

>
> Hmmm?
>
> static void __mcheck_cpu_init_clear_banks(void)
> {
> struct mce_bank *mce_banks = this_cpu_read(mce_banks_array);
> int i;
>
> for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
> struct mce_bank *b = &mce_banks[i];
>
> rdmsrl(msr_ops.ctl(i), b->ctl);
>
> /* Bank is initialized if bits are set in hardware. */
> b->init = !!b->ctl;
> if (b->init) {
> wrmsrl(msr_ops.ctl(i), b->ctl);
> wrmsrl(msr_ops.status(i), 0);
> }
>
> }
> }


I think the intent of the original patch was to find out
which bits are "implemented in hardware". I.e. throw all
1's at the register and see if any of them stick.

I don't object to the idea behind the patch. But if you want
to do this you just should not modify b->ctl.

So something like:


static void __mcheck_cpu_init_clear_banks(void)
{
struct mce_bank *mce_banks = this_cpu_read(mce_banks_array);
u64 tmp;
int i;

for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
struct mce_bank *b = &mce_banks[i];

if (b->init) {
wrmsrl(msr_ops.ctl(i), b->ctl);
wrmsrl(msr_ops.status(i), 0);
rdmsrl(msr_ops.ctl(i), tmp);

/* Check if any bits implemented in h/w */
b->init = !!tmp;
}

}
}

-Tony

-Tony

2019-05-17 00:04:22

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware

On Thu, May 16, 2019 at 08:20:58PM +0000, Ghannam, Yazen wrote:
> We don't actually know if there are bits set in hardware until we read
> it back. So I don't think this is adding anything new.

Bah, of course. We need to read it first (pasting the whole function).
Now, __mcheck_cpu_init_clear_banks() gets called when we change
configuration too, in mce_cpu_restart() and if we do it this way, we'll
be rereading MCi_CTL each time but I don't see anything wrong with that.

Hmmm?

static void __mcheck_cpu_init_clear_banks(void)
{
struct mce_bank *mce_banks = this_cpu_read(mce_banks_array);
int i;

for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
struct mce_bank *b = &mce_banks[i];

rdmsrl(msr_ops.ctl(i), b->ctl);

/* Bank is initialized if bits are set in hardware. */
b->init = !!b->ctl;
if (b->init) {
wrmsrl(msr_ops.ctl(i), b->ctl);
wrmsrl(msr_ops.status(i), 0);
}

}
}

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2019-05-17 00:14:38

by Yazen Ghannam

[permalink] [raw]
Subject: RE: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware

> -----Original Message-----
> From: [email protected] <[email protected]> On Behalf Of Borislav Petkov
> Sent: Thursday, May 16, 2019 12:21 PM
> To: Ghannam, Yazen <[email protected]>
> Cc: Luck, Tony <[email protected]>; [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware
>
>
> On Thu, May 16, 2019 at 05:09:11PM +0000, Ghannam, Yazen wrote:
> > So that the sysfs files show the control values that are set in the
> > hardware. It seemed like this would be more helpful than showing all
> > 0xF's.
>
> Yeah, but it has been like that since forever and it hasn't bugged
> anybody. Probably because anybody doesn't even look at those files. As
> Tony says:
>
> "RAS is a lonely subsystem ... even EDAC gets more love."
>
> :-)))
>
> And adding yet another vendor check for this seemed just not worth it.
>
> > Should I send out another version of this set?
>
> I simply zapped 5/6. I still think your 6/6 makes sense though.
>
> ---
> From: Yazen Ghannam <[email protected]>
> Date: Tue, 30 Apr 2019 20:32:21 +0000
> Subject: [PATCH] x86/MCE: Determine MCA banks' init state properly
>
> The OS is expected to write all bits to MCA_CTL for each bank,
> thus enabling error reporting in all banks. However, some banks
> may be unused in which case the registers for such banks are
> Read-as-Zero/Writes-Ignored. Also, the OS may avoid setting some control
> bits because of quirks, etc.
>
> A bank can be considered uninitialized if the MCA_CTL register returns
> zero. This is because either the OS did not write anything or because
> the hardware is enforcing RAZ/WI for the bank.
>
> Set a bank's init value based on if the control bits are set or not in
> hardware. Return an error code in the sysfs interface for uninitialized
> banks.
>
> [ bp: Massage a bit. ]
>
> Signed-off-by: Yazen Ghannam <[email protected]>
> Signed-off-by: Borislav Petkov <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: "[email protected]" <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Tony Luck <[email protected]>
> Cc: "[email protected]" <[email protected]>
> Link: https://lkml.kernel.org/r/[email protected]
> ---
> arch/x86/kernel/cpu/mce/core.c | 17 +++++++++++++----
> 1 file changed, 13 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 5bcecadcf4d9..c049689f3d73 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -1567,10 +1567,13 @@ static void __mcheck_cpu_init_clear_banks(void)
> for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
> struct mce_bank *b = &mce_banks[i];
>
> - if (!b->init)
> - continue;
> - wrmsrl(msr_ops.ctl(i), b->ctl);
> - wrmsrl(msr_ops.status(i), 0);
> + if (b->init) {
> + wrmsrl(msr_ops.ctl(i), b->ctl);
> + wrmsrl(msr_ops.status(i), 0);
> + }
> +
> + /* Bank is initialized if bits are set in hardware. */
> + b->init = !!b->ctl;

We don't actually know if there are bits set in hardware until we read it back. So I don't think this is adding anything new.

Thanks,
Yazen

2019-05-17 11:23:03

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware

On Thu, May 16, 2019 at 01:59:43PM -0700, Luck, Tony wrote:
> I think the intent of the original patch was to find out
> which bits are "implemented in hardware". I.e. throw all
> 1's at the register and see if any of them stick.

And, in addition, check ->init before showing/setting a bank:

---
@@ -2095,6 +2098,9 @@ static ssize_t show_bank(struct device *s, struct device_attribute *attr,

b = &per_cpu(mce_banks_array, s->id)[bank];

+ if (!b->init)
+ return -ENODEV;
+
return sprintf(buf, "%llx\n", b->ctl);
}

@@ -2113,6 +2119,9 @@ static ssize_t set_bank(struct device *s, struct device_attribute *attr,

b = &per_cpu(mce_banks_array, s->id)[bank];

+ if (!b->init)
+ return -ENODEV;
+
b->ctl = new;
mce_restart();
---

so that you get a feedback whether the setting has even succeeded or
not. Right now we're doing "something" blindly and accepting any b->ctl
from userspace. Yeah, it is root-only but still...

> I don't object to the idea behind the patch. But if you want
> to do this you just should not modify b->ctl.
>
> So something like:
>
>
> static void __mcheck_cpu_init_clear_banks(void)
> {
> struct mce_bank *mce_banks = this_cpu_read(mce_banks_array);
> u64 tmp;
> int i;
>
> for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
> struct mce_bank *b = &mce_banks[i];
>
> if (b->init) {
> wrmsrl(msr_ops.ctl(i), b->ctl);
> wrmsrl(msr_ops.status(i), 0);
> rdmsrl(msr_ops.ctl(i), tmp);
>
> /* Check if any bits implemented in h/w */
> b->init = !!tmp;
> }

... except that we unconditionally set ->init to 1 in
__mcheck_cpu_mce_banks_init() and I think we should query it. Btw, that
name __mcheck_cpu_mce_banks_init() is hideous too. I'll fix those up. In
the meantime, how does the below look like? The change is to tickle out
from the hw whether some CTL bits stick and then use that to determine
b->init setting:

---
From: Yazen Ghannam <[email protected]>
Date: Tue, 30 Apr 2019 20:32:21 +0000
Subject: [PATCH] x86/MCE: Determine MCA banks' init state properly

The OS is expected to write all bits to MCA_CTL for each bank,
thus enabling error reporting in all banks. However, some banks
may be unused in which case the registers for such banks are
Read-as-Zero/Writes-Ignored. Also, the OS may avoid setting some control
bits because of quirks, etc.

A bank can be considered uninitialized if the MCA_CTL register returns
zero. This is because either the OS did not write anything or because
the hardware is enforcing RAZ/WI for the bank.

Set a bank's init value based on if the control bits are set or not in
hardware. Return an error code in the sysfs interface for uninitialized
banks.

[ bp: Massage a bit. Discover bank init state at boot. ]

Signed-off-by: Yazen Ghannam <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "[email protected]" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: "[email protected]" <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
arch/x86/kernel/cpu/mce/core.c | 23 ++++++++++++++++++-----
1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 5bcecadcf4d9..d84b0c707d0e 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1492,9 +1492,16 @@ static int __mcheck_cpu_mce_banks_init(void)

for (i = 0; i < n_banks; i++) {
struct mce_bank *b = &mce_banks[i];
+ u64 val;

b->ctl = -1ULL;
- b->init = 1;
+
+ /* Check if any bits are implemented in h/w */
+ wrmsrl(msr_ops.ctl(i), b->ctl);
+ rdmsrl(msr_ops.ctl(i), val);
+ b->init = !!val;
+
+ wrmsrl(msr_ops.status(i), 0);
}

per_cpu(mce_banks_array, smp_processor_id()) = mce_banks;
@@ -1567,10 +1574,10 @@ static void __mcheck_cpu_init_clear_banks(void)
for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
struct mce_bank *b = &mce_banks[i];

- if (!b->init)
- continue;
- wrmsrl(msr_ops.ctl(i), b->ctl);
- wrmsrl(msr_ops.status(i), 0);
+ if (b->init) {
+ wrmsrl(msr_ops.ctl(i), b->ctl);
+ wrmsrl(msr_ops.status(i), 0);
+ }
}
}

@@ -2095,6 +2102,9 @@ static ssize_t show_bank(struct device *s, struct device_attribute *attr,

b = &per_cpu(mce_banks_array, s->id)[bank];

+ if (!b->init)
+ return -ENODEV;
+
return sprintf(buf, "%llx\n", b->ctl);
}

@@ -2113,6 +2123,9 @@ static ssize_t set_bank(struct device *s, struct device_attribute *attr,

b = &per_cpu(mce_banks_array, s->id)[bank];

+ if (!b->init)
+ return -ENODEV;
+
b->ctl = new;
mce_restart();

--
2.21.0

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2019-05-17 16:18:52

by Yazen Ghannam

[permalink] [raw]
Subject: RE: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware

> -----Original Message-----
> From: [email protected] <[email protected]> On Behalf Of Borislav Petkov
> Sent: Friday, May 17, 2019 5:10 AM
> To: Luck, Tony <[email protected]>
> Cc: Ghannam, Yazen <[email protected]>; [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware
>
>
> On Thu, May 16, 2019 at 01:59:43PM -0700, Luck, Tony wrote:
> > I think the intent of the original patch was to find out
> > which bits are "implemented in hardware". I.e. throw all
> > 1's at the register and see if any of them stick.
>
> And, in addition, check ->init before showing/setting a bank:
>
> ---
> @@ -2095,6 +2098,9 @@ static ssize_t show_bank(struct device *s, struct device_attribute *attr,
>
> b = &per_cpu(mce_banks_array, s->id)[bank];
>
> + if (!b->init)
> + return -ENODEV;
> +
> return sprintf(buf, "%llx\n", b->ctl);
> }
>
> @@ -2113,6 +2119,9 @@ static ssize_t set_bank(struct device *s, struct device_attribute *attr,
>
> b = &per_cpu(mce_banks_array, s->id)[bank];
>
> + if (!b->init)
> + return -ENODEV;
> +
> b->ctl = new;
> mce_restart();
> ---
>
> so that you get a feedback whether the setting has even succeeded or
> not. Right now we're doing "something" blindly and accepting any b->ctl
> from userspace. Yeah, it is root-only but still...
>
> > I don't object to the idea behind the patch. But if you want
> > to do this you just should not modify b->ctl.
> >
> > So something like:
> >
> >
> > static void __mcheck_cpu_init_clear_banks(void)
> > {
> > struct mce_bank *mce_banks = this_cpu_read(mce_banks_array);
> > u64 tmp;
> > int i;
> >
> > for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
> > struct mce_bank *b = &mce_banks[i];
> >
> > if (b->init) {
> > wrmsrl(msr_ops.ctl(i), b->ctl);
> > wrmsrl(msr_ops.status(i), 0);
> > rdmsrl(msr_ops.ctl(i), tmp);
> >
> > /* Check if any bits implemented in h/w */
> > b->init = !!tmp;
> > }
>
> ... except that we unconditionally set ->init to 1 in
> __mcheck_cpu_mce_banks_init() and I think we should query it. Btw, that
> name __mcheck_cpu_mce_banks_init() is hideous too. I'll fix those up. In
> the meantime, how does the below look like? The change is to tickle out
> from the hw whether some CTL bits stick and then use that to determine
> b->init setting:
>
> ---
> From: Yazen Ghannam <[email protected]>
> Date: Tue, 30 Apr 2019 20:32:21 +0000
> Subject: [PATCH] x86/MCE: Determine MCA banks' init state properly
>
> The OS is expected to write all bits to MCA_CTL for each bank,
> thus enabling error reporting in all banks. However, some banks
> may be unused in which case the registers for such banks are
> Read-as-Zero/Writes-Ignored. Also, the OS may avoid setting some control
> bits because of quirks, etc.
>
> A bank can be considered uninitialized if the MCA_CTL register returns
> zero. This is because either the OS did not write anything or because
> the hardware is enforcing RAZ/WI for the bank.
>
> Set a bank's init value based on if the control bits are set or not in
> hardware. Return an error code in the sysfs interface for uninitialized
> banks.
>
> [ bp: Massage a bit. Discover bank init state at boot. ]
>
> Signed-off-by: Yazen Ghannam <[email protected]>
> Signed-off-by: Borislav Petkov <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: "[email protected]" <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Tony Luck <[email protected]>
> Cc: "[email protected]" <[email protected]>
> Link: https://lkml.kernel.org/r/[email protected]
> ---
> arch/x86/kernel/cpu/mce/core.c | 23 ++++++++++++++++++-----
> 1 file changed, 18 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 5bcecadcf4d9..d84b0c707d0e 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -1492,9 +1492,16 @@ static int __mcheck_cpu_mce_banks_init(void)
>
> for (i = 0; i < n_banks; i++) {
> struct mce_bank *b = &mce_banks[i];
> + u64 val;
>
> b->ctl = -1ULL;
> - b->init = 1;
> +
> + /* Check if any bits are implemented in h/w */
> + wrmsrl(msr_ops.ctl(i), b->ctl);
> + rdmsrl(msr_ops.ctl(i), val);
> + b->init = !!val;
> +
> + wrmsrl(msr_ops.status(i), 0);
> }

I think there are a couple of issues here.
1) The bank is being initialized without accounting for any quirks.
2) The bank is being initialized without having set up any handler or other appropriate setup.

Thanks,
Yazen

2019-05-17 17:50:45

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware

On Fri, May 17, 2019 at 10:26:49AM -0700, Luck, Tony wrote:
> Which is a quirk for some models where we don't want to do
> the "write all 1s and see what sticks"

Ok, then we have to do what you suggested yesterday. I've added a short
comment so that I don't get lost again next time.

---
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 5bcecadcf4d9..9056f0a2a90d 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1493,6 +1493,11 @@ static int __mcheck_cpu_mce_banks_init(void)
for (i = 0; i < n_banks; i++) {
struct mce_bank *b = &mce_banks[i];

+ /*
+ * Init them all, __mcheck_cpu_apply_quirks() is going to apply
+ * the required vendor quirks before
+ * __mcheck_cpu_init_clear_banks() does the final bank setup.
+ */
b->ctl = -1ULL;
b->init = 1;
}
@@ -1562,15 +1567,21 @@ static void __mcheck_cpu_init_generic(void)
static void __mcheck_cpu_init_clear_banks(void)
{
struct mce_bank *mce_banks = this_cpu_read(mce_banks_array);
+ u64 msrval;
int i;

for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
struct mce_bank *b = &mce_banks[i];

- if (!b->init)
- continue;
- wrmsrl(msr_ops.ctl(i), b->ctl);
- wrmsrl(msr_ops.status(i), 0);
+ if (b->init) {
+ /* Check if any bits are implemented in h/w */
+ wrmsrl(msr_ops.ctl(i), b->ctl);
+ rdmsrl(msr_ops.ctl(i), msrval);
+
+ b->init = !!msrval;
+
+ wrmsrl(msr_ops.status(i), 0);
+ }
}
}

@@ -2095,6 +2106,9 @@ static ssize_t show_bank(struct device *s, struct device_attribute *attr,

b = &per_cpu(mce_banks_array, s->id)[bank];

+ if (!b->init)
+ return -ENODEV;
+
return sprintf(buf, "%llx\n", b->ctl);
}

@@ -2113,6 +2127,9 @@ static ssize_t set_bank(struct device *s, struct device_attribute *attr,

b = &per_cpu(mce_banks_array, s->id)[bank];

+ if (!b->init)
+ return -ENODEV;
+
b->ctl = new;
mce_restart();

--
2.21.0

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2019-05-17 18:20:35

by Luck, Tony

[permalink] [raw]
Subject: Re: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware

On Fri, May 17, 2019 at 06:37:29PM +0200, Borislav Petkov wrote:
> Now, the
>
> wrmsrl(msr_ops.ctl(i), -1)
> rdmsrl(msr_ops.ctl(i), val);
>
> method of throwing all 1s to see what sticks is what Intel wants, as
> Tony said. Is that going to be a problem on AMD?

It is what we want in general ... but there is this:

if (c->x86_vendor == X86_VENDOR_INTEL) {
/*
* SDM documents that on family 6 bank 0 should not be written
* because it aliases to another special BIOS controlled
* register.
* But it's not aliased anymore on model 0x1a+
* Don't ignore bank 0 completely because there could be a
* valid event later, merely don't write CTL0.
*/

if (c->x86 == 6 && c->x86_model < 0x1A && cfg->banks > 0)
mce_banks[0].init = 0;

Which is a quirk for some models where we don't want to do
the "write all 1s and see what sticks"

-Tony

2019-05-17 18:32:57

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware

On Fri, May 17, 2019 at 03:46:07PM +0000, Ghannam, Yazen wrote:
> I think there are a couple of issues here.
> 1) The bank is being initialized without accounting for any quirks.

Almost. __mcheck_cpu_init_clear_banks() a little bit later corrects
that. I guess I can drop the

wrmsrl(msr_ops.status(i), 0);

in here because __mcheck_cpu_init_clear_banks() does that too.

Now, the

wrmsrl(msr_ops.ctl(i), -1)
rdmsrl(msr_ops.ctl(i), val);

method of throwing all 1s to see what sticks is what Intel wants, as
Tony said. Is that going to be a problem on AMD?

> 2) The bank is being initialized without having set up any handler or
> other appropriate setup.

I'm afraid you're going to have to explain this in more detail...

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2019-05-17 19:17:47

by Luck, Tony

[permalink] [raw]
Subject: Re: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware

On Fri, May 17, 2019 at 07:48:17PM +0200, Borislav Petkov wrote:
> @@ -1562,15 +1567,21 @@ static void __mcheck_cpu_init_generic(void)
> static void __mcheck_cpu_init_clear_banks(void)
> {
> struct mce_bank *mce_banks = this_cpu_read(mce_banks_array);
> + u64 msrval;
> int i;
>
> for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
> struct mce_bank *b = &mce_banks[i];
>
> - if (!b->init)
> - continue;
> - wrmsrl(msr_ops.ctl(i), b->ctl);
> - wrmsrl(msr_ops.status(i), 0);
> + if (b->init) {
> + /* Check if any bits are implemented in h/w */
> + wrmsrl(msr_ops.ctl(i), b->ctl);
> + rdmsrl(msr_ops.ctl(i), msrval);
> +
> + b->init = !!msrval;
> +
> + wrmsrl(msr_ops.status(i), 0);
> + }
> }
> }

Am I misreading the diff here? It doesn't look like you
needed to drop the

if (!b->init)
continue;

and thus end up with that extra level on indent for the rest
of the function.

-Tony

2019-05-17 19:45:22

by Luck, Tony

[permalink] [raw]
Subject: Re: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware

On Fri, May 17, 2019 at 09:34:31PM +0200, Borislav Petkov wrote:
> On Fri, May 17, 2019 at 11:06:07AM -0700, Luck, Tony wrote:
> > and thus end up with that extra level on indent for the rest
> > of the function.
>
> Ok:
>
> @@ -1569,7 +1575,13 @@ static void __mcheck_cpu_init_clear_banks(void)
>
> if (!b->init)
> continue;
> +
> + /* Check if any bits are implemented in h/w */
> wrmsrl(msr_ops.ctl(i), b->ctl);
> + rdmsrl(msr_ops.ctl(i), msrval);
> +
> + b->init = !!msrval;
> +
> wrmsrl(msr_ops.status(i), 0);
> }

Much neater :-)

Acked-by: Tony Luck <[email protected]>

2019-05-17 19:50:38

by Yazen Ghannam

[permalink] [raw]
Subject: RE: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware

> -----Original Message-----
> From: [email protected] <[email protected]> On Behalf Of Borislav Petkov
> Sent: Friday, May 17, 2019 2:35 PM
> To: Luck, Tony <[email protected]>
> Cc: Ghannam, Yazen <[email protected]>; [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware
>
>
> On Fri, May 17, 2019 at 11:06:07AM -0700, Luck, Tony wrote:
> > and thus end up with that extra level on indent for the rest
> > of the function.
>
> Ok:
>
> ---
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 5bcecadcf4d9..25e501a853cd 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -1493,6 +1493,11 @@ static int __mcheck_cpu_mce_banks_init(void)
> for (i = 0; i < n_banks; i++) {
> struct mce_bank *b = &mce_banks[i];
>
> + /*
> + * Init them all, __mcheck_cpu_apply_quirks() is going to apply
> + * the required vendor quirks before
> + * __mcheck_cpu_init_clear_banks() does the final bank setup.
> + */
> b->ctl = -1ULL;
> b->init = 1;
> }
> @@ -1562,6 +1567,7 @@ static void __mcheck_cpu_init_generic(void)
> static void __mcheck_cpu_init_clear_banks(void)
> {
> struct mce_bank *mce_banks = this_cpu_read(mce_banks_array);
> + u64 msrval;
> int i;
>
> for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
> @@ -1569,7 +1575,13 @@ static void __mcheck_cpu_init_clear_banks(void)
>
> if (!b->init)
> continue;
> +
> + /* Check if any bits are implemented in h/w */
> wrmsrl(msr_ops.ctl(i), b->ctl);
> + rdmsrl(msr_ops.ctl(i), msrval);
> +
> + b->init = !!msrval;
> +

Just a minor nit, but can we group the comment, RDMSR, and check together? The WRMSR is part of normal operation and isn't tied to the check.

Thanks,
Yazen

2019-05-17 19:51:42

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware

On Fri, May 17, 2019 at 12:44:05PM -0700, Luck, Tony wrote:
> Much neater :-)

Finally! :-)

> Acked-by: Tony Luck <[email protected]>

Thx.

Yazen, any objections left?

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2019-05-17 21:11:25

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware

On Fri, May 17, 2019 at 07:49:10PM +0000, Ghannam, Yazen wrote:
> > @@ -1569,7 +1575,13 @@ static void __mcheck_cpu_init_clear_banks(void)
> >
> > if (!b->init)
> > continue;
> > +
> > + /* Check if any bits are implemented in h/w */
> > wrmsrl(msr_ops.ctl(i), b->ctl);
> > + rdmsrl(msr_ops.ctl(i), msrval);
> > +
> > + b->init = !!msrval;
> > +
> Just a minor nit, but can we group the comment, RDMSR, and check
> together? The WRMSR is part of normal operation and isn't tied to the
> check.

Of course it is - that's the "throw all 1s at it" part :)

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2019-05-17 21:51:27

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware

On Fri, May 17, 2019 at 11:06:07AM -0700, Luck, Tony wrote:
> and thus end up with that extra level on indent for the rest
> of the function.

Ok:

---
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 5bcecadcf4d9..25e501a853cd 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1493,6 +1493,11 @@ static int __mcheck_cpu_mce_banks_init(void)
for (i = 0; i < n_banks; i++) {
struct mce_bank *b = &mce_banks[i];

+ /*
+ * Init them all, __mcheck_cpu_apply_quirks() is going to apply
+ * the required vendor quirks before
+ * __mcheck_cpu_init_clear_banks() does the final bank setup.
+ */
b->ctl = -1ULL;
b->init = 1;
}
@@ -1562,6 +1567,7 @@ static void __mcheck_cpu_init_generic(void)
static void __mcheck_cpu_init_clear_banks(void)
{
struct mce_bank *mce_banks = this_cpu_read(mce_banks_array);
+ u64 msrval;
int i;

for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
@@ -1569,7 +1575,13 @@ static void __mcheck_cpu_init_clear_banks(void)

if (!b->init)
continue;
+
+ /* Check if any bits are implemented in h/w */
wrmsrl(msr_ops.ctl(i), b->ctl);
+ rdmsrl(msr_ops.ctl(i), msrval);
+
+ b->init = !!msrval;
+
wrmsrl(msr_ops.status(i), 0);
}
}
@@ -2095,6 +2107,9 @@ static ssize_t show_bank(struct device *s, struct device_attribute *attr,

b = &per_cpu(mce_banks_array, s->id)[bank];

+ if (!b->init)
+ return -ENODEV;
+
return sprintf(buf, "%llx\n", b->ctl);
}

@@ -2113,6 +2128,9 @@ static ssize_t set_bank(struct device *s, struct device_attribute *attr,

b = &per_cpu(mce_banks_array, s->id)[bank];

+ if (!b->init)
+ return -ENODEV;
+
b->ctl = new;
mce_restart();


--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2019-05-18 15:39:17

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v3 4/6] x86/MCE: Make number of MCA banks per_cpu

On Tue, Apr 30, 2019 at 08:32:20PM +0000, Ghannam, Yazen wrote:
> From: Yazen Ghannam <[email protected]>
>
> The number of MCA banks is provided per logical CPU. Historically, this
> number has been the same across all CPUs, but this is not an
> architectural guarantee. Future AMD systems may have MCA bank counts
> that vary between logical CPUs in a system.
>
> This issue was partially addressed in
>
> 006c077041dc ("x86/mce: Handle varying MCA bank counts")
>
> by allocating structures using the maximum number of MCA banks and by
> saving the maximum MCA bank count in a system as the global count. This
> means that some extra structures are allocated. Also, this means that
> CPUs will spend more time in the #MC and other handlers checking extra
> MCA banks.

...

> @@ -1480,14 +1482,15 @@ EXPORT_SYMBOL_GPL(mce_notify_irq);
>
> static int __mcheck_cpu_mce_banks_init(void)
> {
> + u8 n_banks = this_cpu_read(mce_num_banks);
> struct mce_bank *mce_banks;
> int i;
>
> - mce_banks = kcalloc(MAX_NR_BANKS, sizeof(struct mce_bank), GFP_KERNEL);
> + mce_banks = kcalloc(n_banks, sizeof(struct mce_bank), GFP_KERNEL);

Something changed in mm land or maybe we were lucky and got away with an
atomic GFP_KERNEL allocation until now but:

[ 2.447838] smp: Bringing up secondary CPUs ...
[ 2.456895] x86: Booting SMP configuration:
[ 2.457822] .... node #0, CPUs: #1
[ 1.344284] BUG: sleeping function called from invalid context at mm/slab.h:418
[ 1.344284] in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
[ 1.344284] no locks held by swapper/1/0.
[ 1.344284] irq event stamp: 0
[ 1.344284] hardirqs last enabled at (0): [<0000000000000000>] 0x0
[ 1.344284] hardirqs last disabled at (0): [<ffffffff8106bd36>] copy_process.part.59+0x676/0x1b60
[ 1.344284] softirqs last enabled at (0): [<ffffffff8106bd36>] copy_process.part.59+0x676/0x1b60
[ 1.344284] softirqs last disabled at (0): [<0000000000000000>] 0x0
[ 1.344284] Preemption disabled at:
[ 1.344284] [<ffffffff81042a60>] start_secondary+0x50/0x170
[ 1.344284] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.1.0+ #1
[ 1.344284] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A13 05/11/2014
[ 1.344284] Call Trace:
[ 1.344284] dump_stack+0x67/0x9b
[ 1.344284] ___might_sleep+0x16c/0x250
[ 1.344284] __kmalloc+0x182/0x280
[ 1.344284] mcheck_cpu_init+0x1d9/0x430
[ 1.344284] identify_cpu+0x3e5/0x720
[ 1.344284] identify_secondary_cpu+0x13/0x80
[ 1.344284] smp_store_cpu_info+0x48/0x60
[ 1.344284] start_secondary+0x63/0x170
[ 1.344284] ? set_cpu_sibling_map+0x500/0x500
[ 1.344284] secondary_startup_64+0xa4/0xb0
[ 2.587866] #2 #3 #4
[ 2.609287] MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details.
[ 2.618875] #5 #6 #7
[ 2.638092] smp: Brought up 1 node, 8 CPUs
[ 2.639814] smpboot: Max logical packages: 4
[ 2.640830] smpboot: Total of 8 processors activated (57466.51 BogoMIPS)

.config is attached and branch is at

https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=rc0%2b3-ras

Thx.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Attachments:
(No filename) (3.31 kB)
.config (120.62 kB)
Download all attachments

2019-05-21 17:55:19

by Yazen Ghannam

[permalink] [raw]
Subject: RE: [PATCH v3 4/6] x86/MCE: Make number of MCA banks per_cpu

> -----Original Message-----
> From: Borislav Petkov <[email protected]>
> Sent: Saturday, May 18, 2019 6:26 AM
> To: Ghannam, Yazen <[email protected]>
> Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH v3 4/6] x86/MCE: Make number of MCA banks per_cpu
>
>
> On Tue, Apr 30, 2019 at 08:32:20PM +0000, Ghannam, Yazen wrote:
> > From: Yazen Ghannam <[email protected]>
> >
> > The number of MCA banks is provided per logical CPU. Historically, this
> > number has been the same across all CPUs, but this is not an
> > architectural guarantee. Future AMD systems may have MCA bank counts
> > that vary between logical CPUs in a system.
> >
> > This issue was partially addressed in
> >
> > 006c077041dc ("x86/mce: Handle varying MCA bank counts")
> >
> > by allocating structures using the maximum number of MCA banks and by
> > saving the maximum MCA bank count in a system as the global count. This
> > means that some extra structures are allocated. Also, this means that
> > CPUs will spend more time in the #MC and other handlers checking extra
> > MCA banks.
>
> ...
>
> > @@ -1480,14 +1482,15 @@ EXPORT_SYMBOL_GPL(mce_notify_irq);
> >
> > static int __mcheck_cpu_mce_banks_init(void)
> > {
> > + u8 n_banks = this_cpu_read(mce_num_banks);
> > struct mce_bank *mce_banks;
> > int i;
> >
> > - mce_banks = kcalloc(MAX_NR_BANKS, sizeof(struct mce_bank), GFP_KERNEL);
> > + mce_banks = kcalloc(n_banks, sizeof(struct mce_bank), GFP_KERNEL);
>
> Something changed in mm land or maybe we were lucky and got away with an
> atomic GFP_KERNEL allocation until now but:
>
> [ 2.447838] smp: Bringing up secondary CPUs ...
> [ 2.456895] x86: Booting SMP configuration:
> [ 2.457822] .... node #0, CPUs: #1

The issue seems to be that the allocation is now happening on CPUs other than CPU0.

Patch 2 in this set has the same issue. I didn't see it until I turned on the "Lock Debugging" config options.

> [ 1.344284] BUG: sleeping function called from invalid context at mm/slab.h:418

This message comes from ___might_sleep() which checks the system_state.

On CPU0, system_state=SYSTEM_BOOTING.

On every other CPU, system_state=SYSTEM_SCHEDULING, and that's the only system_state where the message is shown.

Changing GFP_KERNEL to GFP_ATOMIC seems to be a fix. Is this appropriate? Or do you think there's something else we could try?

Thanks,
Yazen

2019-05-21 20:31:57

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v3 4/6] x86/MCE: Make number of MCA banks per_cpu

On Tue, May 21, 2019 at 05:52:42PM +0000, Ghannam, Yazen wrote:
> This message comes from ___might_sleep() which checks the
> system_state.
>
> On CPU0, system_state=SYSTEM_BOOTING.
>
> On every other CPU, system_state=SYSTEM_SCHEDULING, and that's the
> only system_state where the message is shown.

Right, the check in ___might_sleep().

> Changing GFP_KERNEL to GFP_ATOMIC seems to be a fix. Is this
> appropriate? Or do you think there's something else we could try?

From: Documentation/core-api/memory-allocation.rst

* If you think that accessing memory reserves is justified and the kernel
will be stressed unless allocation succeeds, you may use ``GFP_ATOMIC``.


I don't think the MCA banks representation justifies accessing memory
reserves.

Can we do instead:

-static DEFINE_PER_CPU_READ_MOSTLY(struct mce_bank *, mce_banks_array);
+static DEFINE_PER_CPU_READ_MOSTLY(struct mce_bank, mce_banks_array[MAX_NR_BANKS]);

which should be something like 9*32 = 288 bytes per CPU.

Unless you have a better idea...

Thx.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply. Srsly.

2019-05-21 20:45:13

by Luck, Tony

[permalink] [raw]
Subject: Re: [PATCH v3 4/6] x86/MCE: Make number of MCA banks per_cpu

On Tue, May 21, 2019 at 10:29:02PM +0200, Borislav Petkov wrote:
>
> Can we do instead:
>
> -static DEFINE_PER_CPU_READ_MOSTLY(struct mce_bank *, mce_banks_array);
> +static DEFINE_PER_CPU_READ_MOSTLY(struct mce_bank, mce_banks_array[MAX_NR_BANKS]);
>
> which should be something like 9*32 = 288 bytes per CPU.
>

Where did you get the "9" from? struct mce_bank looks to
be over 50 bytes.

Still only 1.5K per cpu though.

-Tony

2019-05-21 23:10:27

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v3 4/6] x86/MCE: Make number of MCA banks per_cpu

On Tue, May 21, 2019 at 01:42:40PM -0700, Luck, Tony wrote:
> On Tue, May 21, 2019 at 10:29:02PM +0200, Borislav Petkov wrote:
> >
> > Can we do instead:
> >
> > -static DEFINE_PER_CPU_READ_MOSTLY(struct mce_bank *, mce_banks_array);
> > +static DEFINE_PER_CPU_READ_MOSTLY(struct mce_bank, mce_banks_array[MAX_NR_BANKS]);
> >
> > which should be something like 9*32 = 288 bytes per CPU.
> >
>
> Where did you get the "9" from? struct mce_bank looks to
> be over 50 bytes.

Patch 2/6 changes that:

struct mce_bank {
u64 ctl; /* subevents to enable */
bool init; /* initialise bank? */
+};
+static DEFINE_PER_CPU_READ_MOSTLY(struct mce_bank *, mce_banks_percpu);
+
+#define ATTR_LEN 16
+/* One object for each MCE bank, shared by all CPUs */
+struct mce_bank_dev {
struct device_attribute attr; /* device attribute */
char attrname[ATTR_LEN]; /* attribute name */
+ u8 bank; /* bank number */
};
+static struct mce_bank_dev mce_bank_devs[MAX_NR_BANKS];

> Still only 1.5K per cpu though.

Yah, I think that using static per-CPU memory should be better than
GFP_ATOMIC.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply. Srsly.

2019-05-22 14:03:15

by Yazen Ghannam

[permalink] [raw]
Subject: RE: [PATCH v3 4/6] x86/MCE: Make number of MCA banks per_cpu

> -----Original Message-----
> From: [email protected] <[email protected]> On Behalf Of Borislav Petkov
> Sent: Tuesday, May 21, 2019 6:09 PM
> To: Luck, Tony <[email protected]>
> Cc: Ghannam, Yazen <[email protected]>; [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH v3 4/6] x86/MCE: Make number of MCA banks per_cpu
>
>
> On Tue, May 21, 2019 at 01:42:40PM -0700, Luck, Tony wrote:
> > On Tue, May 21, 2019 at 10:29:02PM +0200, Borislav Petkov wrote:
> > >
> > > Can we do instead:
> > >
> > > -static DEFINE_PER_CPU_READ_MOSTLY(struct mce_bank *, mce_banks_array);
> > > +static DEFINE_PER_CPU_READ_MOSTLY(struct mce_bank, mce_banks_array[MAX_NR_BANKS]);
> > >
> > > which should be something like 9*32 = 288 bytes per CPU.
> > >
> >
> > Where did you get the "9" from? struct mce_bank looks to
> > be over 50 bytes.
>
> Patch 2/6 changes that:
>
> struct mce_bank {
> u64 ctl; /* subevents to enable */
> bool init; /* initialise bank? */
> +};
> +static DEFINE_PER_CPU_READ_MOSTLY(struct mce_bank *, mce_banks_percpu);
> +
> +#define ATTR_LEN 16
> +/* One object for each MCE bank, shared by all CPUs */
> +struct mce_bank_dev {
> struct device_attribute attr; /* device attribute */
> char attrname[ATTR_LEN]; /* attribute name */
> + u8 bank; /* bank number */
> };
> +static struct mce_bank_dev mce_bank_devs[MAX_NR_BANKS];
>
> > Still only 1.5K per cpu though.
>
> Yah, I think that using static per-CPU memory should be better than
> GFP_ATOMIC.
>

Okay, makes sense. I'll send a patch soon.

Thanks,
Yazen

2019-05-23 20:02:28

by Yazen Ghannam

[permalink] [raw]
Subject: RE: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware

> -----Original Message-----
> From: Borislav Petkov <[email protected]>
> Sent: Friday, May 17, 2019 3:02 PM
> To: Ghannam, Yazen <[email protected]>
> Cc: Luck, Tony <[email protected]>; [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware
>
>
> On Fri, May 17, 2019 at 07:49:10PM +0000, Ghannam, Yazen wrote:
> > > @@ -1569,7 +1575,13 @@ static void __mcheck_cpu_init_clear_banks(void)
> > >
> > > if (!b->init)
> > > continue;
> > > +
> > > + /* Check if any bits are implemented in h/w */
> > > wrmsrl(msr_ops.ctl(i), b->ctl);
> > > + rdmsrl(msr_ops.ctl(i), msrval);
> > > +
> > > + b->init = !!msrval;
> > > +
> > Just a minor nit, but can we group the comment, RDMSR, and check
> > together? The WRMSR is part of normal operation and isn't tied to the
> > check.
>
> Of course it is - that's the "throw all 1s at it" part :)
>

I did a bit more testing and I noticed that writing "0" disables a bank with no way to reenable it.

For example:
1) Read bank10.
a) Succeeds; returns "fffffffffffffff".
2) Write "0" to bank10.
a) Succeeds; hardware register is set to "0".
b) Hardware register is checked, and b->init=0.
3) Read bank10.
a) Fails, because b->init=0.
4) Write non-zero value to bank10 to reenable it.
a) Fails, because b->init=0.
5) Reboot needed to reset bank.

Is that okay?

Thanks,
Yazen

2019-05-27 23:31:11

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware

On Thu, May 23, 2019 at 08:00:33PM +0000, Ghannam, Yazen wrote:
> I did a bit more testing and I noticed that writing "0" disables a bank with no way to reenable it.
>
> For example:
> 1) Read bank10.
> a) Succeeds; returns "fffffffffffffff".
> 2) Write "0" to bank10.
> a) Succeeds; hardware register is set to "0".
> b) Hardware register is checked, and b->init=0.
> 3) Read bank10.
> a) Fails, because b->init=0.
> 4) Write non-zero value to bank10 to reenable it.
> a) Fails, because b->init=0.
> 5) Reboot needed to reset bank.
>
> Is that okay?

Nope, that doesn't sound correct to me.

I guess the cleanest way to handle his properly would be to have a
function called something like __mcheck_cpu_init_banks() which gets
called in mcheck_cpu_init() after the quirks have run and then does the
final poking of the banks and sets b->init properly.

__mcheck_cpu_init_clear_banks() should then be renamed to
__mcheck_cpu_clear_banks() to denote that it only clears the banks and
would only do:

if (!b->init)
continue;

wrmsrl(msr_ops.ctl(i), b->ctl);
wrmsrl(msr_ops.status(i), 0);

And then sprinkle some commenting to not forget the scheme again.

Yeah, this sounds clean to me but you guys might have a better idea...

Thx.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply. Srsly.

2019-06-07 14:51:53

by Yazen Ghannam

[permalink] [raw]
Subject: RE: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware

> -----Original Message-----
> From: Borislav Petkov <[email protected]>
> Sent: Monday, May 27, 2019 6:29 PM
> To: Ghannam, Yazen <[email protected]>
> Cc: Luck, Tony <[email protected]>; [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware
>
>
> I guess the cleanest way to handle his properly would be to have a
> function called something like __mcheck_cpu_init_banks() which gets
> called in mcheck_cpu_init() after the quirks have run and then does the
> final poking of the banks and sets b->init properly.
>
> __mcheck_cpu_init_clear_banks() should then be renamed to
> __mcheck_cpu_clear_banks() to denote that it only clears the banks and
> would only do:
>
> if (!b->init)
> continue;
>
> wrmsrl(msr_ops.ctl(i), b->ctl);
> wrmsrl(msr_ops.status(i), 0);
>

Would you mind if the function name stayed the same? The reason is that MCA_CTL is written here, which is the "init" part, and MCA_STATUS is cleared.

I can use another name for the check, e.g. __mcheck_cpu_check_banks() or __mcheck_cpu_banks_check_init().

Thanks,
Yazen

2019-06-07 16:39:03

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware

On Fri, Jun 07, 2019 at 02:49:42PM +0000, Ghannam, Yazen wrote:
> Would you mind if the function name stayed the same? The reason is
> that MCA_CTL is written here, which is the "init" part, and MCA_STATUS
> is cleared.
>
> I can use another name for the check, e.g. __mcheck_cpu_check_banks()
> or __mcheck_cpu_banks_check_init().

Nevermind, leave it as is. I'll fix it up ontop. I don't like that
"__mcheck_cpu_init" prefixing there which is a mouthful and should
simply be "mce_cpu_<do_stuff>" to denote that it is a function which is
run on a CPU to setup stuff.

Thx.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2019-06-07 16:46:13

by Yazen Ghannam

[permalink] [raw]
Subject: RE: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware

> -----Original Message-----
> From: [email protected] <[email protected]> On Behalf Of Borislav Petkov
> Sent: Friday, June 7, 2019 11:37 AM
> To: Ghannam, Yazen <[email protected]>
> Cc: Luck, Tony <[email protected]>; [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware
>
> On Fri, Jun 07, 2019 at 02:49:42PM +0000, Ghannam, Yazen wrote:
> > Would you mind if the function name stayed the same? The reason is
> > that MCA_CTL is written here, which is the "init" part, and MCA_STATUS
> > is cleared.
> >
> > I can use another name for the check, e.g. __mcheck_cpu_check_banks()
> > or __mcheck_cpu_banks_check_init().
>
> Nevermind, leave it as is. I'll fix it up ontop. I don't like that
> "__mcheck_cpu_init" prefixing there which is a mouthful and should
> simply be "mce_cpu_<do_stuff>" to denote that it is a function which is
> run on a CPU to setup stuff.
>

Yeah, I agree.

I have another version of this set that I can send today. It includes the changes for this patch and also includes the fix for the locking bug message.

Should I send out the new version? Or do you want me to wait for any fixes on top of the current version?

Thanks,
Yazen

2019-06-07 18:56:35

by Yazen Ghannam

[permalink] [raw]
Subject: RE: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware

> -----Original Message-----
> From: [email protected] <[email protected]> On Behalf Of Borislav Petkov
> Sent: Friday, June 7, 2019 11:59 AM
> To: Ghannam, Yazen <[email protected]>
> Cc: Luck, Tony <[email protected]>; [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware
>
> On Fri, Jun 07, 2019 at 04:44:24PM +0000, Ghannam, Yazen wrote:
> > I have another version of this set that I can send today. It includes
> > the changes for this patch and also includes the fix for the locking
> > bug message.
> >
> > Should I send out the new version? Or do you want me to wait for any
> > fixes on top of the current version?
>
> I don't understand - I think we said to feel free to rework it all by using
>
> https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=rc0%2b3-ras
>
> and reworking the whole branch to accomodate the changes and then
> sending a whole new series...
>

Right, I took that branch, squashed the locking fix into patch 2, fixed up the remaining patches, and then redid the last patch.

I plan to send the result as a v4 of this patchset with all the links, version history, etc. Is that what you mean? Sorry, if I misunderstood.

Thanks,
Yazen

2019-06-07 18:58:32

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware

On Fri, Jun 07, 2019 at 05:08:04PM +0000, Ghannam, Yazen wrote:
> Right, I took that branch, squashed the locking fix into patch 2,
> fixed up the remaining patches, and then redid the last patch.
>
> I plan to send the result as a v4 of this patchset with all the links,
> version history, etc. Is that what you mean?

Yap, exactly!

> Sorry, if I misunderstood.

No worries, we're on the same page now :-)

Thx.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2019-06-07 19:42:27

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware

On Fri, Jun 07, 2019 at 04:44:24PM +0000, Ghannam, Yazen wrote:
> I have another version of this set that I can send today. It includes
> the changes for this patch and also includes the fix for the locking
> bug message.
>
> Should I send out the new version? Or do you want me to wait for any
> fixes on top of the current version?

I don't understand - I think we said to feel free to rework it all by using

https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=rc0%2b3-ras

and reworking the whole branch to accomodate the changes and then
sending a whole new series...

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2019-06-11 05:14:46

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware

On Fri, Jun 07, 2019 at 06:37:23PM +0200, Borislav Petkov wrote:
> On Fri, Jun 07, 2019 at 02:49:42PM +0000, Ghannam, Yazen wrote:
> > Would you mind if the function name stayed the same? The reason is
> > that MCA_CTL is written here, which is the "init" part, and MCA_STATUS
> > is cleared.
> >
> > I can use another name for the check, e.g. __mcheck_cpu_check_banks()
> > or __mcheck_cpu_banks_check_init().
>
> Nevermind, leave it as is. I'll fix it up ontop. I don't like that
> "__mcheck_cpu_init" prefixing there which is a mouthful and should
> simply be "mce_cpu_<do_stuff>" to denote that it is a function which is
> run on a CPU to setup stuff.

So I'm staring at this and I can't say that I'm getting any good ideas:

I wanna get rid of that ugly "__mcheck_cpu_" prefix but the replacements
I can think of right now, are crap:

* I can call them all "cpu_<bla>" but then they look like generic
cpu-setup functions which come from kernel/cpu.c or so.

* I can prefix them with "mce_cpu" but when you do them all, it becomes
a block of "mce_cpu_" stuff which ain't more readable either. And
besides, those are static functions so they shouldn't need the prefix.
But I'd like the naming to denote that they're doing per-CPU setup
stuff. Which brings me to the previous point.

So no, don't have a good idea yet...

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.