2022-03-01 10:29:19

by Ammar Faizi

[permalink] [raw]
Subject: [PATCH v4 0/2] Two x86 fixes

Hi,

Two fixes for x86 arch.

## Changelog

v4:
- Address comment from Greg, sha1 commit Fixes only needs to be 12 chars.
- Add the author of the fixed commit to the CC list.

v3:
- Fold in changes from Alviro, the previous version is still
leaking @bank[n].

v2:
- Fix wrong copy/paste.

## Short Summary

Patch 1, fixes the wrong asm constraint in delay_loop function.

Fortunately, the constraint violation that's fixed by patch 1 doesn't
yield any bug due to the nature of System V ABI. Should we backport
this?

Patch 2, fixes memory leak in mce/amd code.

Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Signed-off-by: Alviro Iskandar Setiawan <[email protected]>
Signed-off-by: Ammar Faizi <[email protected]>
---
Ammar Faizi (2):
x86/delay: Fix the wrong asm constraint in `delay_loop()`
x86/mce/amd: Fix memory leak when `threshold_create_bank()` fails

arch/x86/kernel/cpu/mce/amd.c | 16 ++++++++++------
arch/x86/lib/delay.c | 4 ++--
2 files changed, 12 insertions(+), 8 deletions(-)


base-commit: 7e57714cd0ad2d5bb90e50b5096a0e671dec1ef3
--
2.32.0


2022-03-01 10:35:07

by Ammar Faizi

[permalink] [raw]
Subject: [PATCH v4 1/2] x86/delay: Fix the wrong asm constraint in `delay_loop()`

From: Ammar Faizi <[email protected]>

The asm constraint does not reflect that the asm statement can modify
the value of @loops. But the asm statement in delay_loop() does change
the @loops.

If by any chance the compiler inlines this function, it may clobber
random stuff (e.g. local variable, important temporary value in reg,
etc.).

Fortunately, delay_loop() is only called indirectly (so it can't
inline), and then the register it clobbers is %rax (which is by the
nature of the calling convention, it's a caller saved register), so it
didn't yield any bug.

^ That shouldn't be an excuse for using the wrong constraint anyway.

This changes "a" (as an input) to "+a" (as an input and output).

Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Jiri Hladky <[email protected]>
Fixes: e01b70ef3eb3 ("x86: fix bug in arch/i386/lib/delay.c file, delay_loop function")
Signed-off-by: Ammar Faizi <[email protected]>
---

Fortunately, the constraint violation that's fixed by patch 1 doesn't
yield any bug due to the nature of System V ABI. Should we backport
this?

arch/x86/lib/delay.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/lib/delay.c b/arch/x86/lib/delay.c
index 65d15df6212d..0e65d00e2339 100644
--- a/arch/x86/lib/delay.c
+++ b/arch/x86/lib/delay.c
@@ -54,8 +54,8 @@ static void delay_loop(u64 __loops)
" jnz 2b \n"
"3: dec %0 \n"

- : /* we don't need output */
- :"a" (loops)
+ : "+a" (loops)
+ :
);
}

--
2.32.0

2022-03-01 10:47:37

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v4 1/2] x86/delay: Fix the wrong asm constraint in `delay_loop()`

From: Ammar Faizi
> Sent: 01 March 2022 09:46
>
> The asm constraint does not reflect that the asm statement can modify
> the value of @loops. But the asm statement in delay_loop() does change
> the @loops.
>
> If by any chance the compiler inlines this function, it may clobber
> random stuff (e.g. local variable, important temporary value in reg,
> etc.).
>
> Fortunately, delay_loop() is only called indirectly (so it can't
> inline), and then the register it clobbers is %rax (which is by the
> nature of the calling convention, it's a caller saved register), so it
> didn't yield any bug.

Both the function pointers in that code need killing.
They only have two options (each) so conditional branches
will almost certainly always have been better.

I also wonder how well the comment
The additional jump magic is needed to get the timing stable
on all the CPU' we have to worry about.
applies to any modern cpu!
The code is unchanged since (at least) 2.6.27.
(It might have been moved from another file.)

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2022-03-01 15:39:34

by Ammar Faizi

[permalink] [raw]
Subject: [PATCH v4 2/2] x86/mce/amd: Fix memory leak when `threshold_create_bank()` fails

From: Ammar Faizi <[email protected]>

@bp is a local variable, calling mce_threshold_remove_device() when
threshold_create_bank() fails will not free the @bp. Note that
mce_threshold_remove_device() frees the @bp only if it's already
stored in the @threshold_banks per-CPU variable.

At that point, the @threshold_banks per-CPU variable is still NULL,
so the mce_threshold_remove_device() will just be a no-op and the
@bp is leaked.

Fix this by storing @bp to @threshold_banks before the loop, so in
case we fail, mce_threshold_remove_device() will free the @bp.

This bug is introduced by commit 6458de97fc15530b544 ("x86/mce/amd:
Straighten CPU hotplug path") [1].

Link: https://lore.kernel.org/all/[email protected] [1]

v4:
- Add the link to the commit reference again.

v3:
- Fold in changes from Alviro, the previous version is still
leaking @bank[n].

v2:
- No changes.

Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: [email protected] # v5.8+
Fixes: 6458de97fc15 ("x86/mce/amd: Straighten CPU hotplug path")
Co-authored-by: Alviro Iskandar Setiawan <[email protected]>
Signed-off-by: Alviro Iskandar Setiawan <[email protected]>
Signed-off-by: Ammar Faizi <[email protected]>
---
arch/x86/kernel/cpu/mce/amd.c | 16 ++++++++++------
1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 9f4b508886dd..a5ef161facd9 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -1346,19 +1346,23 @@ int mce_threshold_create_device(unsigned int cpu)
if (!bp)
return -ENOMEM;

+ /*
+ * If we fail, mce_threshold_remove_device() will free the @bp
+ * via @threshold_banks.
+ */
+ this_cpu_write(threshold_banks, bp);
+
for (bank = 0; bank < numbanks; ++bank) {
if (!(this_cpu_read(bank_map) & (1 << bank)))
continue;
err = threshold_create_bank(bp, cpu, bank);
- if (err)
- goto out_err;
+ if (err) {
+ mce_threshold_remove_device(cpu);
+ return err;
+ }
}
- this_cpu_write(threshold_banks, bp);

if (thresholding_irq_en)
mce_threshold_vector = amd_threshold_interrupt;
return 0;
-out_err:
- mce_threshold_remove_device(cpu);
- return err;
}
--
2.32.0

2022-03-01 17:03:29

by Alviro Iskandar Setiawan

[permalink] [raw]
Subject: Re: [PATCH v4 1/2] x86/delay: Fix the wrong asm constraint in `delay_loop()`

On Tue, Mar 1, 2022 at 4:46 PM Ammar Faizi wrote:
> Fortunately, the constraint violation that's fixed by patch 1 doesn't
> yield any bug due to the nature of System V ABI. Should we backport
> this?

hi sir, it might also be interesting to know that even if it never be
inlined, it's still potential to break.

for example this code (https://godbolt.org/z/xWMTxhTET)

__attribute__((__noinline__)) static void x(int a)
{
asm("xorl\t%%r8d, %%r8d"::"a"(a));
}

extern int p(void);

int f(void)
{
int ret = p();
x(ret);
return ret;
}

translates to this asm

x:
movl %edi, %eax
xorl %r8d, %r8d
ret
f:
subq $8, %rsp
call p
movl %eax, %r8d
movl %eax, %edi
call x
movl %r8d, %eax
addq $8, %rsp
ret

See the %r8d? It should be clobbered by a function call too. But since
no one tells the compiler that we clobber %r8d, it assumes %r8d never
changes after that call. The compiler thinks x() is static and will
not clobber %r8d, even the ABI says %r8d will be clobbered by a
function call. So i think it should be backported to the stable
kernel, it's still a fix

-- Viro

2022-03-02 17:45:35

by Yazen Ghannam

[permalink] [raw]
Subject: Re: [PATCH v4 2/2] x86/mce/amd: Fix memory leak when `threshold_create_bank()` fails

On Tue, Mar 01, 2022 at 04:46:08PM +0700, Ammar Faizi wrote:
> From: Ammar Faizi <[email protected]>
>

Hi Ammar,

...

> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
> index 9f4b508886dd..a5ef161facd9 100644
> --- a/arch/x86/kernel/cpu/mce/amd.c
> +++ b/arch/x86/kernel/cpu/mce/amd.c
> @@ -1346,19 +1346,23 @@ int mce_threshold_create_device(unsigned int cpu)
> if (!bp)
> return -ENOMEM;
>
> + /*
> + * If we fail, mce_threshold_remove_device() will free the @bp
> + * via @threshold_banks.
> + */
> + this_cpu_write(threshold_banks, bp);
> +
> for (bank = 0; bank < numbanks; ++bank) {
> if (!(this_cpu_read(bank_map) & (1 << bank)))
> continue;
> err = threshold_create_bank(bp, cpu, bank);
> - if (err)
> - goto out_err;
> + if (err) {
> + mce_threshold_remove_device(cpu);
> + return err;
> + }
> }
> - this_cpu_write(threshold_banks, bp);
>

The threshold interrupt handler uses this pointer. I think the goal here is to
set this pointer when the list is fully formed and clear this pointer before
making any changes to the list. Otherwise, the interrupt handler will operate
on incomplete data if an interrupt comes in the middle of these updates.

The changes below should deal with memory leak issue while avoiding a race
with the threshold interrupt. What do you think?

Thanks,
Yazen

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 1940d305db1c..8f3b7859331d 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -1294,10 +1294,22 @@ static void threshold_remove_bank(struct threshold_bank *bank)
kfree(bank);
}

+void _mce_threshold_remove_device(struct threshold_bank **bp)
+{
+ unsigned int bank, numbanks = this_cpu_read(mce_num_banks);
+
+ for (bank = 0; bank < numbanks; bank++) {
+ if (bp[bank]) {
+ threshold_remove_bank(bp[bank]);
+ bp[bank] = NULL;
+ }
+ }
+ kfree(bp);
+}
+
int mce_threshold_remove_device(unsigned int cpu)
{
struct threshold_bank **bp = this_cpu_read(threshold_banks);
- unsigned int bank, numbanks = this_cpu_read(mce_num_banks);

if (!bp)
return 0;
@@ -1308,13 +1320,7 @@ int mce_threshold_remove_device(unsigned int cpu)
*/
this_cpu_write(threshold_banks, NULL);

- for (bank = 0; bank < numbanks; bank++) {
- if (bp[bank]) {
- threshold_remove_bank(bp[bank]);
- bp[bank] = NULL;
- }
- }
- kfree(bp);
+ _mce_threshold_remove_device(bp);
return 0;
}

@@ -1360,6 +1366,6 @@ int mce_threshold_create_device(unsigned int cpu)
mce_threshold_vector = amd_threshold_interrupt;
return 0;
out_err:
- mce_threshold_remove_device(cpu);
+ _mce_threshold_remove_device(bp);
return err;
}

2022-03-03 00:34:51

by Ammar Faizi

[permalink] [raw]
Subject: Re: [PATCH v4 2/2] x86/mce/amd: Fix memory leak when `threshold_create_bank()` fails

On 3/3/22 6:20 AM, Ammar Faizi wrote:
> On 3/3/22 12:26 AM, Yazen Ghannam wrote:
>> Hi Ammar,
>
> Hi Yazen,
>
>> ...
>> The threshold interrupt handler uses this pointer. I think the goal here is to
>> set this pointer when the list is fully formed and clear this pointer before
>> making any changes to the list. Otherwise, the interrupt handler will operate
>> on incomplete data if an interrupt comes in the middle of these updates.
>>
>> The changes below should deal with memory leak issue while avoiding a race
>> with the threshold interrupt. What do you think?
>
> Thanks for taking a look into this. I didn't notice that before. The
> changes look good to me, extra improvements:
>
> 1) _mce_threshold_remove_device() should be static as we don't use it
>    in another translation unit.
> 2) Minor cleanup, we don't need "goto out_err", just early return
>    directly.
>
> I will fold them in...
>

Please review the patch below, if you think it looks good, I will
send this for the v5 series. I added your sign-off.

From cae3965734a67d11a5286c612dfddf52398defc8 Mon Sep 17 00:00:00 2001
From: Ammar Faizi <[email protected]>
Date: Thu, 3 Mar 2022 05:07:38 +0700
Subject: [PATCH v5 2/2] x86/MCE/AMD: Fix memory leak when `threshold_create_bank()` fails

In mce_threshold_create_device(), when threshold_create_bank() fails,
the @bp will be leaked, because mce_threshold_remove_device() will
not free the @bp. It only frees the @bp when we've already written
the @bp to the @threshold_banks per-CPU variable, but at the point,
we haven't.

Fix this by extracting the cleanup part into a new static function
_mce_threshold_remove_device(), then use it from create and remove
device function.

Also, eliminate the "goto out_err". Just early return inside the loop
when we fail.

Cc: Borislav Petkov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: [email protected] # v5.8+
Fixes: 6458de97fc15 ("x86/mce/amd: Straighten CPU hotplug path")
Co-authored-by: Alviro Iskandar Setiawan <[email protected]>
Signed-off-by: Alviro Iskandar Setiawan <[email protected]>
Co-authored-by: Yazen Ghannam <[email protected]>
Signed-off-by: Yazen Ghannam <[email protected]>
Signed-off-by: Ammar Faizi <[email protected]>
---
arch/x86/kernel/cpu/mce/amd.c | 31 ++++++++++++++++++-------------
1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 9f4b508886dd..ac7246a4de08 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -1293,10 +1293,22 @@ static void threshold_remove_bank(struct threshold_bank *bank)
kfree(bank);
}

+static void _mce_threshold_remove_device(struct threshold_bank **bp)
+{
+ unsigned int bank, numbanks = this_cpu_read(mce_num_banks);
+
+ for (bank = 0; bank < numbanks; bank++) {
+ if (bp[bank]) {
+ threshold_remove_bank(bp[bank]);
+ bp[bank] = NULL;
+ }
+ }
+ kfree(bp);
+}
+
int mce_threshold_remove_device(unsigned int cpu)
{
struct threshold_bank **bp = this_cpu_read(threshold_banks);
- unsigned int bank, numbanks = this_cpu_read(mce_num_banks);

if (!bp)
return 0;
@@ -1307,13 +1319,7 @@ int mce_threshold_remove_device(unsigned int cpu)
*/
this_cpu_write(threshold_banks, NULL);

- for (bank = 0; bank < numbanks; bank++) {
- if (bp[bank]) {
- threshold_remove_bank(bp[bank]);
- bp[bank] = NULL;
- }
- }
- kfree(bp);
+ _mce_threshold_remove_device(bp);
return 0;
}

@@ -1350,15 +1356,14 @@ int mce_threshold_create_device(unsigned int cpu)
if (!(this_cpu_read(bank_map) & (1 << bank)))
continue;
err = threshold_create_bank(bp, cpu, bank);
- if (err)
- goto out_err;
+ if (err) {
+ _mce_threshold_remove_device(bp);
+ return err;
+ }
}
this_cpu_write(threshold_banks, bp);

if (thresholding_irq_en)
mce_threshold_vector = amd_threshold_interrupt;
return 0;
-out_err:
- mce_threshold_remove_device(cpu);
- return err;
}

--
Ammar Faizi

2022-03-03 00:42:42

by Ammar Faizi

[permalink] [raw]
Subject: Re: [PATCH v4 2/2] x86/mce/amd: Fix memory leak when `threshold_create_bank()` fails

On 3/3/22 12:26 AM, Yazen Ghannam wrote:
> Hi Ammar,

Hi Yazen,

> ...
> The threshold interrupt handler uses this pointer. I think the goal here is to
> set this pointer when the list is fully formed and clear this pointer before
> making any changes to the list. Otherwise, the interrupt handler will operate
> on incomplete data if an interrupt comes in the middle of these updates.
>
> The changes below should deal with memory leak issue while avoiding a race
> with the threshold interrupt. What do you think?

Thanks for taking a look into this. I didn't notice that before. The
changes look good to me, extra improvements:

1) _mce_threshold_remove_device() should be static as we don't use it
in another translation unit.
2) Minor cleanup, we don't need "goto out_err", just early return
directly.

I will fold them in...

--
Ammar Faizi

2022-03-03 00:47:15

by Ammar Faizi

[permalink] [raw]
Subject: Re: [PATCH v4 1/2] x86/delay: Fix the wrong asm constraint in `delay_loop()`

On 3/1/22 6:33 PM, Alviro Iskandar Setiawan wrote:
> hi sir, it might also be interesting to know that even if it never be
> inlined, it's still potential to break.
>
> for example this code (https://godbolt.org/z/xWMTxhTET)
>
> __attribute__((__noinline__)) static void x(int a)
> {
> asm("xorl\t%%r8d, %%r8d"::"a"(a));
> }
>
> extern int p(void);
>
> int f(void)
> {
> int ret = p();
> x(ret);
> return ret;
> }
>
> translates to this asm
>
> x:
> movl %edi, %eax
> xorl %r8d, %r8d
> ret
> f:
> subq $8, %rsp
> call p
> movl %eax, %r8d
> movl %eax, %edi
> call x
> movl %r8d, %eax
> addq $8, %rsp
> ret
>
> See the %r8d? It should be clobbered by a function call too. But since
> no one tells the compiler that we clobber %r8d, it assumes %r8d never
> changes after that call. The compiler thinks x() is static and will
> not clobber %r8d, even the ABI says %r8d will be clobbered by a
> function call. So i think it should be backported to the stable
> kernel, it's still a fix

Thanks. I will add CC stable in the v5.

--
Ammar Faizi

2022-03-03 00:51:19

by Ammar Faizi

[permalink] [raw]
Subject: Re: [PATCH v4 1/2] x86/delay: Fix the wrong asm constraint in `delay_loop()`

On 3/1/22 4:54 PM, David Laight wrote:
> Both the function pointers in that code need killing.
> They only have two options (each) so conditional branches
> will almost certainly always have been better.

Yes, I agree with simply using conditional branches to handle this
case. But to keep the changes minimal for the stable tree, let's fix
the obvious real bug first. Someone can refactor it later, but I
don't see that as an urgent thing to refactor.

> I also wonder how well the comment
> The additional jump magic is needed to get the timing stable
> on all the CPU' we have to worry about.
> applies to any modern cpu!
> The code is unchanged since (at least) 2.6.27.
> (It might have been moved from another file.)
Not sure about that...

Thanks for the feedback.

--
Ammar Faizi

2022-03-03 01:04:04

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v4 1/2] x86/delay: Fix the wrong asm constraint in `delay_loop()`

From: Alviro Iskandar Setiawan
> Sent: 01 March 2022 11:34
>
> On Tue, Mar 1, 2022 at 4:46 PM Ammar Faizi wrote:
> > Fortunately, the constraint violation that's fixed by patch 1 doesn't
> > yield any bug due to the nature of System V ABI. Should we backport
> > this?
>
> hi sir, it might also be interesting to know that even if it never be
> inlined, it's still potential to break.
>
> for example this code (https://godbolt.org/z/xWMTxhTET)
>
> __attribute__((__noinline__)) static void x(int a)
> {
> asm("xorl\t%%r8d, %%r8d"::"a"(a));
> }

But this code isn't doing that.
In your example the compiler has looked at the static function
and realised that is doesn't use r8 so it need not be saved
even though it is a volatile register.

In this code the compiler knows %ax is being used, it just
doesn't know it is changed - so could assume the value
is unchanged.

The only code that is likely to break is:

int f(int d)
{
d += 10;
__delay(d);
return d;
}

Which might manage to return the value of %eax modified by the asm.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2022-03-03 02:02:26

by Alviro Iskandar Setiawan

[permalink] [raw]
Subject: Re: [PATCH v4 2/2] x86/mce/amd: Fix memory leak when `threshold_create_bank()` fails

On Thu, 3 Mar 2022 06:27:33 +0700, Ammar Faizi wrote:
> +static void _mce_threshold_remove_device(struct threshold_bank **bp)
> +{
> + unsigned int bank, numbanks = this_cpu_read(mce_num_banks);
> +
> + for (bank = 0; bank < numbanks; bank++) {
> + if (bp[bank]) {
> + threshold_remove_bank(bp[bank]);
> + bp[bank] = NULL;
> + }
> + }
> + kfree(bp);
> +}

hi sir, i think this can be improved again, we can avoid calling
this_cpu_read(mce_num_banks) twice if we pass the numbanks as an
argument, plz review the changes below

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 9f4b508886dd..e492efab68a2 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -1293,10 +1293,23 @@ static void threshold_remove_bank(struct threshold_bank *bank)
kfree(bank);
}

+static void _mce_threshold_remove_device(struct threshold_bank **bp,
+ unsigned int numbanks)
+{
+ unsigned int bank;
+
+ for (bank = 0; bank < numbanks; bank++) {
+ if (bp[bank]) {
+ threshold_remove_bank(bp[bank]);
+ bp[bank] = NULL;
+ }
+ }
+ kfree(bp);
+}
+
int mce_threshold_remove_device(unsigned int cpu)
{
struct threshold_bank **bp = this_cpu_read(threshold_banks);
- unsigned int bank, numbanks = this_cpu_read(mce_num_banks);

if (!bp)
return 0;
@@ -1307,13 +1320,7 @@ int mce_threshold_remove_device(unsigned int cpu)
*/
this_cpu_write(threshold_banks, NULL);

- for (bank = 0; bank < numbanks; bank++) {
- if (bp[bank]) {
- threshold_remove_bank(bp[bank]);
- bp[bank] = NULL;
- }
- }
- kfree(bp);
+ _mce_threshold_remove_device(bp, this_cpu_read(mce_num_banks));
return 0;
}

@@ -1350,15 +1357,14 @@ int mce_threshold_create_device(unsigned int cpu)
if (!(this_cpu_read(bank_map) & (1 << bank)))
continue;
err = threshold_create_bank(bp, cpu, bank);
- if (err)
- goto out_err;
+ if (err) {
+ _mce_threshold_remove_device(bp, numbanks);
+ return err;
+ }
}
this_cpu_write(threshold_banks, bp);

if (thresholding_irq_en)
mce_threshold_vector = amd_threshold_interrupt;
return 0;
-out_err:
- mce_threshold_remove_device(cpu);
- return err;
}

-- Viro

2022-03-03 02:09:03

by Ammar Faizi

[permalink] [raw]
Subject: Re: [PATCH v4 2/2] x86/mce/amd: Fix memory leak when `threshold_create_bank()` fails

On 3/3/22 8:58 AM, Alviro Iskandar Setiawan wrote:
> hi sir, i think this can be improved again, we can avoid calling
> this_cpu_read(mce_num_banks) twice if we pass the numbanks as an
> argument, plz review the changes below

OK, nice improvement. I will fold this in...

--
Ammar Faizi

2022-03-03 02:38:43

by Ammar Faizi

[permalink] [raw]
Subject: Re: [PATCH v4 2/2] x86/mce/amd: Fix memory leak when `threshold_create_bank()` fails

On 3/3/22 9:07 AM, Ammar Faizi wrote:
> On 3/3/22 8:58 AM, Alviro Iskandar Setiawan wrote:
>> hi sir, i think this can be improved again, we can avoid calling
>> this_cpu_read(mce_num_banks) twice if we pass the numbanks as an
>> argument, plz review the changes below
>
> OK, nice improvement. I will fold this in...
>

It looks like this now. Yazen, Alviro, please review the
following patch. If you think it looks good, I will submit
it for the v5.

From 91a447f837d502b7a040cd68f333fb98f4b941d9 Mon Sep 17 00:00:00 2001
From: Ammar Faizi <[email protected]>
Date: Thu, 3 Mar 2022 09:22:17 +0700
Subject: [PATCH v5 2/2] x86/MCE/AMD: Fix memory leak when `threshold_create_bank()` fails

In mce_threshold_create_device(), if threshold_create_bank() fails, the
@bp will be leaked, because mce_threshold_remove_device() will not free
the @bp. It only frees the @bp when we've already written the @bp to
the @threshold_banks per-CPU variable, but at the point, we haven't.

Fix this by extracting the cleanup part into a new static function
_mce_threshold_remove_device(), then use it from create/remove device
function. Also, eliminate the "goto out_err". Just early return inside
the loop if we fail.

Cc: Borislav Petkov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: [email protected] # v5.8+
Fixes: 6458de97fc15 ("x86/mce/amd: Straighten CPU hotplug path")
Co-authored-by: Alviro Iskandar Setiawan <[email protected]>
Signed-off-by: Alviro Iskandar Setiawan <[email protected]>
Co-authored-by: Yazen Ghannam <[email protected]>
Signed-off-by: Yazen Ghannam <[email protected]>
Signed-off-by: Ammar Faizi <[email protected]>
---
arch/x86/kernel/cpu/mce/amd.c | 32 +++++++++++++++++++-------------
1 file changed, 19 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 9f4b508886dd..e492efab68a2 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -1293,10 +1293,23 @@ static void threshold_remove_bank(struct threshold_bank *bank)
kfree(bank);
}

+static void _mce_threshold_remove_device(struct threshold_bank **bp,
+ unsigned int numbanks)
+{
+ unsigned int bank;
+
+ for (bank = 0; bank < numbanks; bank++) {
+ if (bp[bank]) {
+ threshold_remove_bank(bp[bank]);
+ bp[bank] = NULL;
+ }
+ }
+ kfree(bp);
+}
+
int mce_threshold_remove_device(unsigned int cpu)
{
struct threshold_bank **bp = this_cpu_read(threshold_banks);
- unsigned int bank, numbanks = this_cpu_read(mce_num_banks);

if (!bp)
return 0;
@@ -1307,13 +1320,7 @@ int mce_threshold_remove_device(unsigned int cpu)
*/
this_cpu_write(threshold_banks, NULL);

- for (bank = 0; bank < numbanks; bank++) {
- if (bp[bank]) {
- threshold_remove_bank(bp[bank]);
- bp[bank] = NULL;
- }
- }
- kfree(bp);
+ _mce_threshold_remove_device(bp, this_cpu_read(mce_num_banks));
return 0;
}

@@ -1350,15 +1357,14 @@ int mce_threshold_create_device(unsigned int cpu)
if (!(this_cpu_read(bank_map) & (1 << bank)))
continue;
err = threshold_create_bank(bp, cpu, bank);
- if (err)
- goto out_err;
+ if (err) {
+ _mce_threshold_remove_device(bp, numbanks);
+ return err;
+ }
}
this_cpu_write(threshold_banks, bp);

if (thresholding_irq_en)
mce_threshold_vector = amd_threshold_interrupt;
return 0;
-out_err:
- mce_threshold_remove_device(cpu);
- return err;
}
--
Ammar Faizi

2022-03-03 02:51:41

by Alviro Iskandar Setiawan

[permalink] [raw]
Subject: Re: [PATCH v4 2/2] x86/mce/amd: Fix memory leak when `threshold_create_bank()` fails

On Thu, Mar 3, 2022 at 9:32 AM Ammar Faizi wrote:
> On 3/3/22 9:07 AM, Ammar Faizi wrote:
> > On 3/3/22 8:58 AM, Alviro Iskandar Setiawan wrote:
> > > hi sir, i think this can be improved again, we can avoid calling
> > > this_cpu_read(mce_num_banks) twice if we pass the numbanks as an
> > > argument, plz review the changes below
> >
> > OK, nice improvement. I will fold this in...
> >
>
> It looks like this now. Yazen, Alviro, please review the
> following patch. If you think it looks good, I will submit
> it for the v5.
>

i think it looks good, thanks sir

-- Viro

2022-03-07 09:30:33

by Ammar Faizi

[permalink] [raw]
Subject: Re: [PATCH v4 2/2] x86/mce/amd: Fix memory leak when `threshold_create_bank()` fails

On 3/3/22 9:32 AM, Ammar Faizi wrote:
> It looks like this now. Yazen, Alviro, please review the
> following patch. If you think it looks good, I will submit
> it for the v5.
>
> From 91a447f837d502b7a040cd68f333fb98f4b941d9 Mon Sep 17 00:00:00 2001
> From: Ammar Faizi <[email protected]>
> Date: Thu, 3 Mar 2022 09:22:17 +0700
> Subject: [PATCH v5 2/2] x86/MCE/AMD: Fix memory leak when `threshold_create_bank()` fails
>
> In mce_threshold_create_device(), if threshold_create_bank() fails, the
> @bp will be leaked, because mce_threshold_remove_device() will not free
> the @bp. It only frees the @bp when we've already written the @bp to
> the @threshold_banks per-CPU variable, but at the point, we haven't.
>
> Fix this by extracting the cleanup part into a new static function
> _mce_threshold_remove_device(), then use it from create/remove device
> function. Also, eliminate the "goto out_err". Just early return inside
> the loop if we fail.
>
> Cc: Borislav Petkov <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Greg Kroah-Hartman <[email protected]>
> Cc: [email protected] # v5.8+
> Fixes: 6458de97fc15 ("x86/mce/amd: Straighten CPU hotplug path")
> Co-authored-by: Alviro Iskandar Setiawan <[email protected]>
> Signed-off-by: Alviro Iskandar Setiawan <[email protected]>
> Co-authored-by: Yazen Ghannam <[email protected]>
> Signed-off-by: Yazen Ghannam <[email protected]>
> Signed-off-by: Ammar Faizi <[email protected]>


Hi,

It's Monday morning...

Friendly ping for Yazen from AMD. What do you think of this patch? If
it looks good, I will submit it for the v5 revision. See the ref below
if you lost track of the full message.

Ref: https://lore.kernel.org/lkml/[email protected]/

Thanks!

--
Ammar Faizi

2022-03-10 01:54:11

by Yazen Ghannam

[permalink] [raw]
Subject: Re: [PATCH v4 2/2] x86/mce/amd: Fix memory leak when `threshold_create_bank()` fails

On Mon, Mar 07, 2022 at 07:27:44AM +0700, Ammar Faizi wrote:
> On 3/3/22 9:32 AM, Ammar Faizi wrote:
> > It looks like this now. Yazen, Alviro, please review the
> > following patch. If you think it looks good, I will submit
> > it for the v5.
> >
> > From 91a447f837d502b7a040cd68f333fb98f4b941d9 Mon Sep 17 00:00:00 2001
> > From: Ammar Faizi <[email protected]>
> > Date: Thu, 3 Mar 2022 09:22:17 +0700
> > Subject: [PATCH v5 2/2] x86/MCE/AMD: Fix memory leak when `threshold_create_bank()` fails
> >
> > In mce_threshold_create_device(), if threshold_create_bank() fails, the
> > @bp will be leaked, because mce_threshold_remove_device() will not free
> > the @bp. It only frees the @bp when we've already written the @bp to
> > the @threshold_banks per-CPU variable, but at the point, we haven't.
> >
> > Fix this by extracting the cleanup part into a new static function
> > _mce_threshold_remove_device(), then use it from create/remove device
> > function. Also, eliminate the "goto out_err". Just early return inside
> > the loop if we fail.
> >

The commit message should use passive voice: no "I" or "we".

Otherwise, I think the patch looks good.

Thanks,
Yazen

2022-03-10 13:00:49

by Ammar Faizi

[permalink] [raw]
Subject: Re: [PATCH v4 2/2] x86/mce/amd: Fix memory leak when `threshold_create_bank()` fails

On 3/10/22 3:55 AM, Yazen Ghannam wrote:
> The commit message should use passive voice: no "I" or "we".
>
> Otherwise, I think the patch looks good.

Fixed in the v5 revision, thanks!

Link: https://lore.kernel.org/lkml/[email protected]/

--
Ammar Faizi