2024-03-07 00:07:41

by Tony Luck

[permalink] [raw]
Subject: [PATCH v2] x86/mce: Dynamically size space for machine check records

Systems with a large number of CPUs may generate a large
number of machine check records when things go seriously
wrong. But Linux has a fixed buffer that can only capture
a few dozen errors.

Allocate space based on the number of CPUs (with a minimum
value based on the historical fixed buffer that could store
80 records).

Signed-off-by: Tony Luck <[email protected]>
---

Changes since v1: Link: https://lore.kernel.org/all/Zd--PJp-NbXGrb39@agluck-desk3/

Sohil:
Group declaration of "order" with other int's in mce_gen_pool_create()
Use #define MCE_MIN_ENTRIES instead of hard-coded inline "80"
Missed kfree(mce_pool) in error path.

Yazen:
Use order_base_2() instead of ilog2() as rounded up size of
structure is needed.

Avadhut:
Allocate 2 records per CPU

Me:
Add a #define MCE_PER_CPU for number of records per CPU

arch/x86/kernel/cpu/mce/genpool.c | 23 +++++++++++++++++------
1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/genpool.c b/arch/x86/kernel/cpu/mce/genpool.c
index fbe8b61c3413..42ce3dc97ca8 100644
--- a/arch/x86/kernel/cpu/mce/genpool.c
+++ b/arch/x86/kernel/cpu/mce/genpool.c
@@ -16,14 +16,14 @@
* used to save error information organized in a lock-less list.
*
* This memory pool is only to be used to save MCE records in MCE context.
- * MCE events are rare, so a fixed size memory pool should be enough. Use
- * 2 pages to save MCE events for now (~80 MCE records at most).
+ * MCE events are rare, so a fixed size memory pool should be enough.
+ * Allocate on a sliding scale based on number of CPUs.
*/
-#define MCE_POOLSZ (2 * PAGE_SIZE)
+#define MCE_MIN_ENTRIES 80
+#define MCE_PER_CPU 2

static struct gen_pool *mce_evt_pool;
static LLIST_HEAD(mce_event_llist);
-static char gen_pool_buf[MCE_POOLSZ];

/*
* Compare the record "t" with each of the records on list "l" to see if
@@ -118,16 +118,27 @@ int mce_gen_pool_add(struct mce *mce)

static int mce_gen_pool_create(void)
{
+ int mce_numrecords, mce_poolsz, order;
struct gen_pool *tmpp;
int ret = -ENOMEM;
+ void *mce_pool;

- tmpp = gen_pool_create(ilog2(sizeof(struct mce_evt_llist)), -1);
+ order = order_base_2(sizeof(struct mce_evt_llist));
+ tmpp = gen_pool_create(order, -1);
if (!tmpp)
goto out;

- ret = gen_pool_add(tmpp, (unsigned long)gen_pool_buf, MCE_POOLSZ, -1);
+ mce_numrecords = max(MCE_MIN_ENTRIES, num_possible_cpus() * MCE_PER_CPU);
+ mce_poolsz = mce_numrecords * (1 << order);
+ mce_pool = kmalloc(mce_poolsz, GFP_KERNEL);
+ if (!mce_pool) {
+ gen_pool_destroy(tmpp);
+ goto out;
+ }
+ ret = gen_pool_add(tmpp, (unsigned long)mce_pool, mce_poolsz, -1);
if (ret) {
gen_pool_destroy(tmpp);
+ kfree(mce_pool);
goto out;
}


base-commit: d206a76d7d2726f3b096037f2079ce0bd3ba329b
--
2.43.0



2024-03-07 12:17:15

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2] x86/mce: Dynamically size space for machine check records

On Wed, Mar 06, 2024 at 04:02:56PM -0800, Tony Luck wrote:
> - ret = gen_pool_add(tmpp, (unsigned long)gen_pool_buf, MCE_POOLSZ, -1);
> + mce_numrecords = max(MCE_MIN_ENTRIES, num_possible_cpus() * MCE_PER_CPU);
> + mce_poolsz = mce_numrecords * (1 << order);
> + mce_pool = kmalloc(mce_poolsz, GFP_KERNEL);
> + if (!mce_pool) {
> + gen_pool_destroy(tmpp);
> + goto out;
> + }
> + ret = gen_pool_add(tmpp, (unsigned long)mce_pool, mce_poolsz, -1);
> if (ret) {
> gen_pool_destroy(tmpp);
> + kfree(mce_pool);
> goto out;

Might as well get rid of the out label too since you're not doing the
error handling pattern of jumping to err* labels and then unwinding. See
diff below.

Otherwise, patch looks ok to me, if we can test it quickly with all
possible scenarios and Linus does a -rc8, I probably can take it even
now...

Thx.

diff --git a/arch/x86/kernel/cpu/mce/genpool.c b/arch/x86/kernel/cpu/mce/genpool.c
index 42ce3dc97ca8..cadf28662a70 100644
--- a/arch/x86/kernel/cpu/mce/genpool.c
+++ b/arch/x86/kernel/cpu/mce/genpool.c
@@ -126,25 +126,24 @@ static int mce_gen_pool_create(void)
order = order_base_2(sizeof(struct mce_evt_llist));
tmpp = gen_pool_create(order, -1);
if (!tmpp)
- goto out;
+ return ret;

mce_numrecords = max(MCE_MIN_ENTRIES, num_possible_cpus() * MCE_PER_CPU);
mce_poolsz = mce_numrecords * (1 << order);
mce_pool = kmalloc(mce_poolsz, GFP_KERNEL);
if (!mce_pool) {
gen_pool_destroy(tmpp);
- goto out;
+ return ret;
}
ret = gen_pool_add(tmpp, (unsigned long)mce_pool, mce_poolsz, -1);
if (ret) {
gen_pool_destroy(tmpp);
kfree(mce_pool);
- goto out;
+ return ret;
}

mce_evt_pool = tmpp;

-out:
return ret;
}


--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-03-07 17:09:24

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2] x86/mce: Dynamically size space for machine check records

On Thu, Mar 07, 2024 at 08:59:53AM -0800, Sohil Mehta wrote:
> I was about the suggest the same thing and maybe slightly more. By
> initializing ret when really needed, I find the code a bit easier to
> follow. No strong preference here.

Except that "really needed" is done this way:

> diff --git a/arch/x86/kernel/cpu/mce/genpool.c
> b/arch/x86/kernel/cpu/mce/genpool.c
> index cadf28662a70..83a01d20bbd9 100644
> --- a/arch/x86/kernel/cpu/mce/genpool.c
> +++ b/arch/x86/kernel/cpu/mce/genpool.c
> @@ -118,22 +118,21 @@ int mce_gen_pool_add(struct mce *mce)
>
> static int mce_gen_pool_create(void)
> {
> - int mce_numrecords, mce_poolsz, order;
> + int mce_numrecords, mce_poolsz, order, ret;
> struct gen_pool *tmpp;
> - int ret = -ENOMEM;
> void *mce_pool;
>

ret = -ENOMEM;
> order = order_base_2(sizeof(struct mce_evt_llist));
> tmpp = gen_pool_create(order, -1);
> if (!tmpp)
> - return ret;
> + return -ENOMEM;
>

ret = -ENOMEM;
> mce_numrecords = max(MCE_MIN_ENTRIES, num_possible_cpus() * MCE_PER_CPU);
> mce_poolsz = mce_numrecords * (1 << order);
> mce_pool = kmalloc(mce_poolsz, GFP_KERNEL);
> if (!mce_pool) {
> gen_pool_destroy(tmpp);
> - return ret;
> + return -ENOMEM;

before each block, so that it is clear what this particular block is
going to return on error.

But those assignments get redundant so the current way is fine, I'd say.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-03-07 17:11:44

by Sohil Mehta

[permalink] [raw]
Subject: Re: [PATCH v2] x86/mce: Dynamically size space for machine check records

Hi Tony,

Overall, the patch looks good to me. Independent of the minor
suggestions below, please feel free to add.

Reviewed-by: Sohil Mehta <[email protected]>


On 3/7/2024 4:16 AM, Borislav Petkov wrote:
> On Wed, Mar 06, 2024 at 04:02:56PM -0800, Tony Luck wrote:
>> - ret = gen_pool_add(tmpp, (unsigned long)gen_pool_buf, MCE_POOLSZ, -1);
>> + mce_numrecords = max(MCE_MIN_ENTRIES, num_possible_cpus() * MCE_PER_CPU);
>> + mce_poolsz = mce_numrecords * (1 << order);
>> + mce_pool = kmalloc(mce_poolsz, GFP_KERNEL);
>> + if (!mce_pool) {
>> + gen_pool_destroy(tmpp);
>> + goto out;
>> + }
>> + ret = gen_pool_add(tmpp, (unsigned long)mce_pool, mce_poolsz, -1);
>> if (ret) {
>> gen_pool_destroy(tmpp);
>> + kfree(mce_pool);
>> goto out;
>
> Might as well get rid of the out label too since you're not doing the
> error handling pattern of jumping to err* labels and then unwinding. See
> diff below.
>

> diff --git a/arch/x86/kernel/cpu/mce/genpool.c b/arch/x86/kernel/cpu/mce/genpool.c
> index 42ce3dc97ca8..cadf28662a70 100644
> --- a/arch/x86/kernel/cpu/mce/genpool.c
> +++ b/arch/x86/kernel/cpu/mce/genpool.c
> @@ -126,25 +126,24 @@ static int mce_gen_pool_create(void)
> order = order_base_2(sizeof(struct mce_evt_llist));
> tmpp = gen_pool_create(order, -1);
> if (!tmpp)
> - goto out;
> + return ret;
>
> mce_numrecords = max(MCE_MIN_ENTRIES, num_possible_cpus() * MCE_PER_CPU);
> mce_poolsz = mce_numrecords * (1 << order);
> mce_pool = kmalloc(mce_poolsz, GFP_KERNEL);
> if (!mce_pool) {
> gen_pool_destroy(tmpp);
> - goto out;
> + return ret;
> }
> ret = gen_pool_add(tmpp, (unsigned long)mce_pool, mce_poolsz, -1);
> if (ret) {
> gen_pool_destroy(tmpp);
> kfree(mce_pool);
> - goto out;
> + return ret;
> }
>
> mce_evt_pool = tmpp;
>
> -out:
> return ret;
> }
>
>

I was about the suggest the same thing and maybe slightly more. By
initializing ret when really needed, I find the code a bit easier to
follow. No strong preference here.

diff --git a/arch/x86/kernel/cpu/mce/genpool.c
b/arch/x86/kernel/cpu/mce/genpool.c
index cadf28662a70..83a01d20bbd9 100644
--- a/arch/x86/kernel/cpu/mce/genpool.c
+++ b/arch/x86/kernel/cpu/mce/genpool.c
@@ -118,22 +118,21 @@ int mce_gen_pool_add(struct mce *mce)

static int mce_gen_pool_create(void)
{
- int mce_numrecords, mce_poolsz, order;
+ int mce_numrecords, mce_poolsz, order, ret;
struct gen_pool *tmpp;
- int ret = -ENOMEM;
void *mce_pool;

order = order_base_2(sizeof(struct mce_evt_llist));
tmpp = gen_pool_create(order, -1);
if (!tmpp)
- return ret;
+ return -ENOMEM;

mce_numrecords = max(MCE_MIN_ENTRIES, num_possible_cpus() * MCE_PER_CPU);
mce_poolsz = mce_numrecords * (1 << order);
mce_pool = kmalloc(mce_poolsz, GFP_KERNEL);
if (!mce_pool) {
gen_pool_destroy(tmpp);
- return ret;
+ return -ENOMEM;
}
ret = gen_pool_add(tmpp, (unsigned long)mce_pool, mce_poolsz, -1);
if (ret) {