I am Seunghun Han and a senior security researcher at National Security
Research Institute of South Korea.
I found a critical security issue which can make kernel panic in userspace.
After analyzing the issue carefully, I found that MCE driver in the kernel
has a problem which can be occurred in SMP environment.
The check_interval file in
/sys/devices/system/machinecheck/machinecheck<cpu number> directory is a
global timer value for MCE polling. If it is changed by one CPU, MCE driver
in kernel calls mce_restart() function and broadcasts the event to other
CPUs to delete and restart MCE polling timer.
The __mcheck_cpu_init_timer() function which is called by mce_restart()
function initializes the mce_timer variable, and the "lock" in mce_timer is
also reinitialized. If more than one CPU write a specific value to
check_interval file concurrently, one can initialize the "lock" in mce_timer
while the others are handling "lock" in mce_timer. This problem causes some
synchronization errors such as kernel panic and kernel hang.
It is a critical security problem because the attacker can make kernel panic
by writing a value to the check_interval file in userspace, and it can be
used for Denial-of-Service (DoS) attack.
To fix this problem, I changed the __mcheck_cpu_init_timer() function to
reuse mce_timer instead of initializing it. The purpose of the function is
to restart the timer and it can be archived by calling
Signed-off-by: Seunghun Han <[email protected]>
---
arch/x86/kernel/cpu/mcheck/mce.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 3d8c573a9a27..d72f2f74f4d7 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1771,7 +1771,6 @@ static void __mcheck_cpu_init_timer(void)
{
struct timer_list *t = this_cpu_ptr(&mce_timer);
- timer_setup(t, mce_timer_fn, TIMER_PINNED);
mce_start_timer(t);
}
@@ -2029,8 +2028,10 @@ static void mce_enable_ce(void *all)
return;
cmci_reenable();
cmci_recheck();
- if (all)
+ if (all) {
+ del_timer_sync(this_cpu_ptr(&mce_timer));
__mcheck_cpu_init_timer();
+ }
}
static struct bus_type mce_subsys = {
--
2.11.0
On Fri, Feb 23, 2018 at 07:13:50PM +0900, Seunghun Han wrote:
> I am Seunghun Han and a senior security researcher at National Security
> Research Institute of South Korea.
>
> I found a critical security issue which can make kernel panic in userspace.
> After analyzing the issue carefully, I found that MCE driver in the kernel
> has a problem which can be occurred in SMP environment.
>
> The check_interval file in
> /sys/devices/system/machinecheck/machinecheck<cpu number> directory is a
> global timer value for MCE polling. If it is changed by one CPU, MCE driver
> in kernel calls mce_restart() function and broadcasts the event to other
Right, so I'm thinking that doing that per-CPU configuration doesn't
make a whole lot of sense. It is not something that needs to happen very
often and it is done globally anyway.
So what we should do here, IMO, is make mce_restart() grab a mutex and
thus serialize all those sysfs writes. It will naturally also slow down
any hammering from userspace which we should not allow anyway.
Tony, what do you think?
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On Fri, Feb 23, 2018 at 07:13:50PM +0900, Seunghun Han wrote:
> I am Seunghun Han and a senior security researcher at National Security
> Research Institute of South Korea.
>
> I found a critical security issue which can make kernel panic in userspace.
> After analyzing the issue carefully, I found that MCE driver in the kernel
> has a problem which can be occurred in SMP environment.
>
> The check_interval file in
> /sys/devices/system/machinecheck/machinecheck<cpu number> directory is a
> global timer value for MCE polling. If it is changed by one CPU, MCE driver
> in kernel calls mce_restart() function and broadcasts the event to other
> CPUs to delete and restart MCE polling timer.
>
> The __mcheck_cpu_init_timer() function which is called by mce_restart()
> function initializes the mce_timer variable, and the "lock" in mce_timer is
> also reinitialized. If more than one CPU write a specific value to
> check_interval file concurrently, one can initialize the "lock" in mce_timer
> while the others are handling "lock" in mce_timer. This problem causes some
> synchronization errors such as kernel panic and kernel hang.
>
> It is a critical security problem because the attacker can make kernel panic
> by writing a value to the check_interval file in userspace, and it can be
> used for Denial-of-Service (DoS) attack.
As only root can write to that file, it's not that critical of an issue,
but yes, this is a problem. Nice find and fix.
>
> To fix this problem, I changed the __mcheck_cpu_init_timer() function to
> reuse mce_timer instead of initializing it. The purpose of the function is
> to restart the timer and it can be archived by calling
>
> Signed-off-by: Seunghun Han <[email protected]>
Cc: stable <[email protected]>
Acked-by: Greg Kroah-Hartman <[email protected]>
Hello, Greg.
2018-02-23 19:52 GMT+09:00 Greg Kroah-Hartman <[email protected]>:
> On Fri, Feb 23, 2018 at 07:13:50PM +0900, Seunghun Han wrote:
>> I am Seunghun Han and a senior security researcher at National Security
>> Research Institute of South Korea.
>>
>> I found a critical security issue which can make kernel panic in userspace.
>> After analyzing the issue carefully, I found that MCE driver in the kernel
>> has a problem which can be occurred in SMP environment.
>>
>> The check_interval file in
>> /sys/devices/system/machinecheck/machinecheck<cpu number> directory is a
>> global timer value for MCE polling. If it is changed by one CPU, MCE driver
>> in kernel calls mce_restart() function and broadcasts the event to other
>> CPUs to delete and restart MCE polling timer.
>>
>> The __mcheck_cpu_init_timer() function which is called by mce_restart()
>> function initializes the mce_timer variable, and the "lock" in mce_timer is
>> also reinitialized. If more than one CPU write a specific value to
>> check_interval file concurrently, one can initialize the "lock" in mce_timer
>> while the others are handling "lock" in mce_timer. This problem causes some
>> synchronization errors such as kernel panic and kernel hang.
>>
>> It is a critical security problem because the attacker can make kernel panic
>> by writing a value to the check_interval file in userspace, and it can be
>> used for Denial-of-Service (DoS) attack.
>
> As only root can write to that file, it's not that critical of an issue,
> but yes, this is a problem. Nice find and fix.
I agree with your opinion.
Thank you for your advice.
Best regards.
Seunghun.
>>
>> To fix this problem, I changed the __mcheck_cpu_init_timer() function to
>> reuse mce_timer instead of initializing it. The purpose of the function is
>> to restart the timer and it can be archived by calling
>>
>> Signed-off-by: Seunghun Han <[email protected]>
>
> Cc: stable <[email protected]>
> Acked-by: Greg Kroah-Hartman <[email protected]>
>
On Mon, Feb 26, 2018 at 05:05:04AM +0900, Seunghun Han wrote:
> >> It is a critical security problem because the attacker can make kernel panic
> >> by writing a value to the check_interval file in userspace, and it can be
> >> used for Denial-of-Service (DoS) attack.
> >
> > As only root can write to that file, it's not that critical of an issue,
> > but yes, this is a problem. Nice find and fix.
This is still the wrong fix. You need to:
1. check the old value of check_interval in store_int_with_restart() and
exit early if it is the same.
2. have mce_restart() grab a newly defined mutex, say, mce_sysfs_mutex
or so, which synchronizes all CPUs so that their timers get deleted and
reinitialized in the proper order.
Thx.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
Hello, Borislav.
2018-02-28 18:32 GMT+09:00 Borislav Petkov <[email protected]>:
> On Mon, Feb 26, 2018 at 05:05:04AM +0900, Seunghun Han wrote:
>> >> It is a critical security problem because the attacker can make kernel panic
>> >> by writing a value to the check_interval file in userspace, and it can be
>> >> used for Denial-of-Service (DoS) attack.
>> >
>> > As only root can write to that file, it's not that critical of an issue,
>> > but yes, this is a problem. Nice find and fix.
>
> This is still the wrong fix. You need to:
>
> 1. check the old value of check_interval in store_int_with_restart() and
> exit early if it is the same.
>
> 2. have mce_restart() grab a newly defined mutex, say, mce_sysfs_mutex
> or so, which synchronizes all CPUs so that their timers get deleted and
> reinitialized in the proper order.
Thank you for your advice.
I will change my patch like that and send it again.
Best regard.
Seunghun.
>
> Thx.
>
> --
> Regards/Gruss,
> Boris.
>
> Good mailing practices for 400: avoid top-posting and trim the reply.