2014-02-19 06:56:32

by Xishi Qiu

[permalink] [raw]
Subject: mm: OS boot failed when set command-line kmemcheck=1

Hi all,

CONFIG_KMEMCHECK=y and set command-line "kmemcheck=1", I find OS
boot failed. The kernel is v3.14.0-rc3

If set "kmemcheck=1 nowatchdog", OS will boot successfully.

Here is the boot failed log:
[ 23.586826] Freeing unused kernel memory: 1160K (ffff8800014de000 - ffff88000
1600000)
[ 23.600248] Freeing unused kernel memory: 1696K (ffff880001858000 - ffff88000
1a00000)
[ 23.615534] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00
000005
[ 23.615534]
[ 23.624885] CPU: 0 PID: 1 Comm: init Tainted: G W 3.14.0-rc3-0.1-de
fault+ #1
[ 23.632957] Hardware name: Huawei Technologies Co., Ltd. Tecal RH2285
/BC11BTSA , BIOS CTSAV036 04/27/2011
[ 23.644661] ffff880c1dd28000 ffff880c1dd31c48 ffffffff814ca491 ffff880c1dd31
cc8
[ 23.652416] ffffffff814ca1e6 0000000000000010 ffff880c1dd31cd8 ffff880c1dd31
c78
[ 23.660171] 0000000000000027 ffff880c1dcb8280 0000000000000005 ffff880c1dd28
000
[ 23.667931] Call Trace:
[ 23.670482] [<ffffffff814ca491>] dump_stack+0x6a/0x79
[ 23.675712] [<ffffffff814ca1e6>] panic+0xb9/0x1f4
[ 23.680599] [<ffffffff8104f78e>] forget_original_parent+0x42e/0x430
[ 23.687043] [<ffffffff81107da0>] ? perf_cgroup_switch+0x170/0x170
[ 23.693314] [<ffffffff8104f7a1>] exit_notify+0x11/0x140
[ 23.698722] [<ffffffff8104fb00>] do_exit+0x230/0x490
[ 23.703865] [<ffffffff8104fda3>] do_group_exit+0x43/0xb0
[ 23.709357] [<ffffffff8105fb31>] get_signal_to_deliver+0x241/0x4b0
[ 23.715713] [<ffffffff81002a0c>] do_notify_resume+0xac/0x1a0
[ 23.721551] [<ffffffff814d712a>] int_signal+0x12/0x17
[ 23.726786] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xf
fffffff80000000-0xffffffff9fffffff)


2014-02-19 07:49:46

by David Rientjes

[permalink] [raw]
Subject: Re: mm: OS boot failed when set command-line kmemcheck=1

On Wed, 19 Feb 2014, Xishi Qiu wrote:

> Hi all,
>
> CONFIG_KMEMCHECK=y and set command-line "kmemcheck=1", I find OS
> boot failed. The kernel is v3.14.0-rc3
>
> If set "kmemcheck=1 nowatchdog", OS will boot successfully.
>

I have automated kernel boots that have both "kmemcheck=0" and
"kmemcheck=1" as the last parameter in the kernel command line every
night and I've never seen it fail on tip or linux-next before.

So I'm sure I won't be able to reproduce your issue, but it may have
something to do with your bootloader that isn't described above. The
sscanf() really wants to be replaced with kstrtoint().

Could you try this out?

diff --git a/arch/x86/mm/kmemcheck/kmemcheck.c b/arch/x86/mm/kmemcheck/kmemcheck.c
--- a/arch/x86/mm/kmemcheck/kmemcheck.c
+++ b/arch/x86/mm/kmemcheck/kmemcheck.c
@@ -78,10 +78,16 @@ early_initcall(kmemcheck_init);
*/
static int __init param_kmemcheck(char *str)
{
+ int val;
+ int ret;
+
if (!str)
return -EINVAL;

- sscanf(str, "%d", &kmemcheck_enabled);
+ ret = kstrtoint(str, 0, &val);
+ if (ret)
+ return ret;
+ kmemcheck_enabled = val;
return 0;
}

2014-02-19 09:39:28

by Xishi Qiu

[permalink] [raw]
Subject: Re: mm: OS boot failed when set command-line kmemcheck=1

On 2014/2/19 15:49, David Rientjes wrote:

> On Wed, 19 Feb 2014, Xishi Qiu wrote:
>
>> Hi all,
>>
>> CONFIG_KMEMCHECK=y and set command-line "kmemcheck=1", I find OS
>> boot failed. The kernel is v3.14.0-rc3
>>
>> If set "kmemcheck=1 nowatchdog", OS will boot successfully.
>>
>
> I have automated kernel boots that have both "kmemcheck=0" and
> "kmemcheck=1" as the last parameter in the kernel command line every
> night and I've never seen it fail on tip or linux-next before.
>
> So I'm sure I won't be able to reproduce your issue, but it may have
> something to do with your bootloader that isn't described above. The
> sscanf() really wants to be replaced with kstrtoint().
>
> Could you try this out?
>
> diff --git a/arch/x86/mm/kmemcheck/kmemcheck.c b/arch/x86/mm/kmemcheck/kmemcheck.c
> --- a/arch/x86/mm/kmemcheck/kmemcheck.c
> +++ b/arch/x86/mm/kmemcheck/kmemcheck.c
> @@ -78,10 +78,16 @@ early_initcall(kmemcheck_init);
> */
> static int __init param_kmemcheck(char *str)
> {
> + int val;
> + int ret;
> +
> if (!str)
> return -EINVAL;
>
> - sscanf(str, "%d", &kmemcheck_enabled);
> + ret = kstrtoint(str, 0, &val);
> + if (ret)
> + return ret;
> + kmemcheck_enabled = val;
> return 0;
> }
>

Hi David,

Thank you for your suggestion, but it still failed.
Here is a warning, I don't whether it is relative to my hardware.
If set "kmemcheck=1 nowatchdog", it can boot.

code:
...
pte = kmemcheck_pte_lookup(address);
if (!pte)
return false;

WARN_ON_ONCE(in_nmi());

if (error_code & 2)
...

log:
[ 10.920683] WARNING: CPU: 0 PID: 1 at arch/x86/mm/kmemcheck/kmemcheck.c:640 k
memcheck_fault+0xb1/0xc0()
[ 10.920684] Modules linked in:
[ 10.920686] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.14.0-rc3-0.1-default+
#3
[ 10.920687] Hardware name: Huawei Technologies Co., Ltd. Tecal RH2285 V2-24S/
BC11SRSC1, BIOS RMISV055 02/02/2013
[ 10.920690] 0000000000000280 ffff88085f807678 ffffffff814ca491 ffff88085f807
6b8
[ 10.920693] ffffffff8104ce97 0000000000000000 ffff88085f807838 ffff88085f420
5d4
[ 10.920695] 0000000000000000 0000000000000000 ffff88085f4205d4 ffff88085f807
6c8
[ 10.920695] Call Trace:
[ 10.920701] <NMI> [<ffffffff814ca491>] dump_stack+0x6a/0x79
[ 10.920705] [<ffffffff8104ce97>] warn_slowpath_common+0x87/0xb0
[ 10.920707] [<ffffffff8104ced5>] warn_slowpath_null+0x15/0x20
[ 10.920710] [<ffffffff810452c1>] kmemcheck_fault+0xb1/0xc0
[ 10.920714] [<ffffffff814d262b>] __do_page_fault+0x39b/0x4c0
[ 10.920718] [<ffffffff81272cd2>] ? put_dec+0x72/0x90
[ 10.920720] [<ffffffff812730ba>] ? number+0x33a/0x360
[ 10.920723] [<ffffffff814d2829>] do_page_fault+0x9/0x10
[ 10.920726] [<ffffffff814cf222>] page_fault+0x22/0x30
[ 10.920731] [<ffffffff81348b4c>] ? vt_console_print+0x8c/0x400
[ 10.920733] [<ffffffff81348b2c>] ? vt_console_print+0x6c/0x400
[ 10.920737] [<ffffffff8109cd9b>] ? msg_print_text+0x18b/0x1f0
[ 10.920739] [<ffffffff8109bed1>] call_console_drivers+0xc1/0xe0
[ 10.920741] [<ffffffff8109d746>] console_unlock+0x236/0x280
[ 10.920744] [<ffffffff8109e095>] vprintk_emit+0x2b5/0x450
[ 10.920746] [<ffffffff810452c1>] ? kmemcheck_fault+0xb1/0xc0
[ 10.920748] [<ffffffff814ca3f7>] printk+0x4a/0x4c
[ 10.920750] [<ffffffff810452c1>] ? kmemcheck_fault+0xb1/0xc0
[ 10.920753] [<ffffffff8104ce4e>] warn_slowpath_common+0x3e/0xb0
[ 10.920755] [<ffffffff8104ced5>] warn_slowpath_null+0x15/0x20
[ 10.920757] [<ffffffff810452c1>] kmemcheck_fault+0xb1/0xc0
[ 10.920760] [<ffffffff814d262b>] __do_page_fault+0x39b/0x4c0
[ 10.920763] [<ffffffff814d2829>] do_page_fault+0x9/0x10
[ 10.920765] [<ffffffff814cf222>] page_fault+0x22/0x30
[ 10.920769] [<ffffffff81015b52>] ? x86_perf_event_update+0x2/0x70
[ 10.920772] [<ffffffff8101de21>] ? intel_pmu_save_and_restart+0x11/0x50
[ 10.920774] [<ffffffff8101eb02>] intel_pmu_handle_irq+0x142/0x3a0
[ 10.920777] [<ffffffff814d0655>] perf_event_nmi_handler+0x35/0x60
[ 10.920779] [<ffffffff814cfe83>] nmi_handle+0x63/0x150
[ 10.920782] [<ffffffff814cffd3>] default_do_nmi+0x63/0x290
[ 10.920784] [<ffffffff814d02a8>] do_nmi+0xa8/0xe0
[ 10.920786] [<ffffffff814cf527>] end_repeat_nmi+0x1e/0x2e
[ 10.920789] [<ffffffff814cf0f0>] ? retint_signal+0x78/0x78
[ 10.920791] [<ffffffff814cf0f0>] ? retint_signal+0x78/0x78
[ 10.920793] [<ffffffff814cf0f0>] ? retint_signal+0x78/0x78
[ 10.920799] <<EOE>> <#DB> [<ffffffff81306b53>] ? acpi_ns_walk_namespace+0x
98/0x251

2014-02-19 22:14:46

by David Rientjes

[permalink] [raw]
Subject: [patch] x86, kmemcheck: Use kstrtoint() instead of sscanf()

Kmemcheck should use the preferred interface for parsing command line
arguments, kstrto*(), rather than sscanf() itself. Use it appropriately.

Signed-off-by: David Rientjes <[email protected]>
---
arch/x86/mm/kmemcheck/kmemcheck.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/kmemcheck/kmemcheck.c b/arch/x86/mm/kmemcheck/kmemcheck.c
--- a/arch/x86/mm/kmemcheck/kmemcheck.c
+++ b/arch/x86/mm/kmemcheck/kmemcheck.c
@@ -78,10 +78,16 @@ early_initcall(kmemcheck_init);
*/
static int __init param_kmemcheck(char *str)
{
+ int val;
+ int ret;
+
if (!str)
return -EINVAL;

- sscanf(str, "%d", &kmemcheck_enabled);
+ ret = kstrtoint(str, 0, &val);
+ if (ret)
+ return ret;
+ kmemcheck_enabled = val;
return 0;
}

2014-02-19 22:24:45

by David Rientjes

[permalink] [raw]
Subject: Re: mm: OS boot failed when set command-line kmemcheck=1

On Wed, 19 Feb 2014, Xishi Qiu wrote:

> Here is a warning, I don't whether it is relative to my hardware.
> If set "kmemcheck=1 nowatchdog", it can boot.
>
> code:
> ...
> pte = kmemcheck_pte_lookup(address);
> if (!pte)
> return false;
>
> WARN_ON_ONCE(in_nmi());
>
> if (error_code & 2)
> ...
>
> log:
> [ 10.920683] WARNING: CPU: 0 PID: 1 at arch/x86/mm/kmemcheck/kmemcheck.c:640 k
> memcheck_fault+0xb1/0xc0()
> [ 10.920684] Modules linked in:
> [ 10.920686] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.14.0-rc3-0.1-default+
> #3
> [ 10.920687] Hardware name: Huawei Technologies Co., Ltd. Tecal RH2285 V2-24S/
> BC11SRSC1, BIOS RMISV055 02/02/2013
> [ 10.920690] 0000000000000280 ffff88085f807678 ffffffff814ca491 ffff88085f807
> 6b8
> [ 10.920693] ffffffff8104ce97 0000000000000000 ffff88085f807838 ffff88085f420
> 5d4
> [ 10.920695] 0000000000000000 0000000000000000 ffff88085f4205d4 ffff88085f807
> 6c8
> [ 10.920695] Call Trace:
> [ 10.920701] <NMI> [<ffffffff814ca491>] dump_stack+0x6a/0x79
> [ 10.920705] [<ffffffff8104ce97>] warn_slowpath_common+0x87/0xb0
> [ 10.920707] [<ffffffff8104ced5>] warn_slowpath_null+0x15/0x20
> [ 10.920710] [<ffffffff810452c1>] kmemcheck_fault+0xb1/0xc0
> [ 10.920714] [<ffffffff814d262b>] __do_page_fault+0x39b/0x4c0
> [ 10.920718] [<ffffffff81272cd2>] ? put_dec+0x72/0x90
> [ 10.920720] [<ffffffff812730ba>] ? number+0x33a/0x360
> [ 10.920723] [<ffffffff814d2829>] do_page_fault+0x9/0x10
> [ 10.920726] [<ffffffff814cf222>] page_fault+0x22/0x30
> [ 10.920731] [<ffffffff81348b4c>] ? vt_console_print+0x8c/0x400
> [ 10.920733] [<ffffffff81348b2c>] ? vt_console_print+0x6c/0x400
> [ 10.920737] [<ffffffff8109cd9b>] ? msg_print_text+0x18b/0x1f0
> [ 10.920739] [<ffffffff8109bed1>] call_console_drivers+0xc1/0xe0
> [ 10.920741] [<ffffffff8109d746>] console_unlock+0x236/0x280
> [ 10.920744] [<ffffffff8109e095>] vprintk_emit+0x2b5/0x450
> [ 10.920746] [<ffffffff810452c1>] ? kmemcheck_fault+0xb1/0xc0
> [ 10.920748] [<ffffffff814ca3f7>] printk+0x4a/0x4c
> [ 10.920750] [<ffffffff810452c1>] ? kmemcheck_fault+0xb1/0xc0
> [ 10.920753] [<ffffffff8104ce4e>] warn_slowpath_common+0x3e/0xb0
> [ 10.920755] [<ffffffff8104ced5>] warn_slowpath_null+0x15/0x20
> [ 10.920757] [<ffffffff810452c1>] kmemcheck_fault+0xb1/0xc0
> [ 10.920760] [<ffffffff814d262b>] __do_page_fault+0x39b/0x4c0
> [ 10.920763] [<ffffffff814d2829>] do_page_fault+0x9/0x10
> [ 10.920765] [<ffffffff814cf222>] page_fault+0x22/0x30
> [ 10.920769] [<ffffffff81015b52>] ? x86_perf_event_update+0x2/0x70
> [ 10.920772] [<ffffffff8101de21>] ? intel_pmu_save_and_restart+0x11/0x50
> [ 10.920774] [<ffffffff8101eb02>] intel_pmu_handle_irq+0x142/0x3a0
> [ 10.920777] [<ffffffff814d0655>] perf_event_nmi_handler+0x35/0x60
> [ 10.920779] [<ffffffff814cfe83>] nmi_handle+0x63/0x150
> [ 10.920782] [<ffffffff814cffd3>] default_do_nmi+0x63/0x290
> [ 10.920784] [<ffffffff814d02a8>] do_nmi+0xa8/0xe0
> [ 10.920786] [<ffffffff814cf527>] end_repeat_nmi+0x1e/0x2e
> [ 10.920789] [<ffffffff814cf0f0>] ? retint_signal+0x78/0x78
> [ 10.920791] [<ffffffff814cf0f0>] ? retint_signal+0x78/0x78
> [ 10.920793] [<ffffffff814cf0f0>] ? retint_signal+0x78/0x78
> [ 10.920799] <<EOE>> <#DB> [<ffffffff81306b53>] ? acpi_ns_walk_namespace+0x
> 98/0x251
>

I added some perf events and kmemcheck people to the cc list. This
appears to happen during an NMI when faulting in struct perf_sample_data
data.

2014-02-26 08:14:41

by Xishi Qiu

[permalink] [raw]
Subject: Re: mm: OS boot failed when set command-line kmemcheck=1

On 2014/2/20 6:24, David Rientjes wrote:

> On Wed, 19 Feb 2014, Xishi Qiu wrote:
>
>> Here is a warning, I don't whether it is relative to my hardware.
>> If set "kmemcheck=1 nowatchdog", it can boot.
>>
>> code:
>> ...
>> pte = kmemcheck_pte_lookup(address);
>> if (!pte)
>> return false;
>>
>> WARN_ON_ONCE(in_nmi());
>>
>> if (error_code & 2)
>> ...
>>
>> log:
>> [ 10.920683] WARNING: CPU: 0 PID: 1 at arch/x86/mm/kmemcheck/kmemcheck.c:640 k
>> memcheck_fault+0xb1/0xc0()
>> [ 10.920684] Modules linked in:
>> [ 10.920686] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.14.0-rc3-0.1-default+
>> #3
>> [ 10.920687] Hardware name: Huawei Technologies Co., Ltd. Tecal RH2285 V2-24S/
>> BC11SRSC1, BIOS RMISV055 02/02/2013
>> [ 10.920690] 0000000000000280 ffff88085f807678 ffffffff814ca491 ffff88085f807
>> 6b8
>> [ 10.920693] ffffffff8104ce97 0000000000000000 ffff88085f807838 ffff88085f420
>> 5d4
>> [ 10.920695] 0000000000000000 0000000000000000 ffff88085f4205d4 ffff88085f807
>> 6c8
>> [ 10.920695] Call Trace:
>> [ 10.920701] <NMI> [<ffffffff814ca491>] dump_stack+0x6a/0x79
>> [ 10.920705] [<ffffffff8104ce97>] warn_slowpath_common+0x87/0xb0
>> [ 10.920707] [<ffffffff8104ced5>] warn_slowpath_null+0x15/0x20
>> [ 10.920710] [<ffffffff810452c1>] kmemcheck_fault+0xb1/0xc0
>> [ 10.920714] [<ffffffff814d262b>] __do_page_fault+0x39b/0x4c0
>> [ 10.920718] [<ffffffff81272cd2>] ? put_dec+0x72/0x90
>> [ 10.920720] [<ffffffff812730ba>] ? number+0x33a/0x360
>> [ 10.920723] [<ffffffff814d2829>] do_page_fault+0x9/0x10
>> [ 10.920726] [<ffffffff814cf222>] page_fault+0x22/0x30
>> [ 10.920731] [<ffffffff81348b4c>] ? vt_console_print+0x8c/0x400
>> [ 10.920733] [<ffffffff81348b2c>] ? vt_console_print+0x6c/0x400
>> [ 10.920737] [<ffffffff8109cd9b>] ? msg_print_text+0x18b/0x1f0
>> [ 10.920739] [<ffffffff8109bed1>] call_console_drivers+0xc1/0xe0
>> [ 10.920741] [<ffffffff8109d746>] console_unlock+0x236/0x280
>> [ 10.920744] [<ffffffff8109e095>] vprintk_emit+0x2b5/0x450
>> [ 10.920746] [<ffffffff810452c1>] ? kmemcheck_fault+0xb1/0xc0
>> [ 10.920748] [<ffffffff814ca3f7>] printk+0x4a/0x4c
>> [ 10.920750] [<ffffffff810452c1>] ? kmemcheck_fault+0xb1/0xc0
>> [ 10.920753] [<ffffffff8104ce4e>] warn_slowpath_common+0x3e/0xb0
>> [ 10.920755] [<ffffffff8104ced5>] warn_slowpath_null+0x15/0x20
>> [ 10.920757] [<ffffffff810452c1>] kmemcheck_fault+0xb1/0xc0
>> [ 10.920760] [<ffffffff814d262b>] __do_page_fault+0x39b/0x4c0
>> [ 10.920763] [<ffffffff814d2829>] do_page_fault+0x9/0x10
>> [ 10.920765] [<ffffffff814cf222>] page_fault+0x22/0x30
>> [ 10.920769] [<ffffffff81015b52>] ? x86_perf_event_update+0x2/0x70
>> [ 10.920772] [<ffffffff8101de21>] ? intel_pmu_save_and_restart+0x11/0x50
>> [ 10.920774] [<ffffffff8101eb02>] intel_pmu_handle_irq+0x142/0x3a0
>> [ 10.920777] [<ffffffff814d0655>] perf_event_nmi_handler+0x35/0x60
>> [ 10.920779] [<ffffffff814cfe83>] nmi_handle+0x63/0x150
>> [ 10.920782] [<ffffffff814cffd3>] default_do_nmi+0x63/0x290
>> [ 10.920784] [<ffffffff814d02a8>] do_nmi+0xa8/0xe0
>> [ 10.920786] [<ffffffff814cf527>] end_repeat_nmi+0x1e/0x2e
>> [ 10.920789] [<ffffffff814cf0f0>] ? retint_signal+0x78/0x78
>> [ 10.920791] [<ffffffff814cf0f0>] ? retint_signal+0x78/0x78
>> [ 10.920793] [<ffffffff814cf0f0>] ? retint_signal+0x78/0x78
>> [ 10.920799] <<EOE>> <#DB> [<ffffffff81306b53>] ? acpi_ns_walk_namespace+0x
>> 98/0x251
>>
>
> I added some perf events and kmemcheck people to the cc list. This
> appears to happen during an NMI when faulting in struct perf_sample_data
> data.
>

Hi David,

Can you try our config or if you can send us your config?

Thanks,
Xishi Qiu

...
CONFIG_HAVE_ARCH_KMEMCHECK=y
CONFIG_KMEMCHECK=y
# CONFIG_KMEMCHECK_DISABLED_BY_DEFAULT is not set
# CONFIG_KMEMCHECK_ENABLED_BY_DEFAULT is not set
CONFIG_KMEMCHECK_ONESHOT_BY_DEFAULT=y
CONFIG_KMEMCHECK_QUEUE_SIZE=64
CONFIG_KMEMCHECK_SHADOW_COPY_SHIFT=5
CONFIG_KMEMCHECK_PARTIAL_OK=y
# CONFIG_KMEMCHECK_BITOPS_OK is not set
...

> .
>


2014-02-26 08:43:24

by Peter Zijlstra

[permalink] [raw]
Subject: Re: mm: OS boot failed when set command-line kmemcheck=1

On Wed, Feb 19, 2014 at 02:24:41PM -0800, David Rientjes wrote:
> On Wed, 19 Feb 2014, Xishi Qiu wrote:
>
> > Here is a warning, I don't whether it is relative to my hardware.
> > If set "kmemcheck=1 nowatchdog", it can boot.
> >
> > code:
> > ...
> > pte = kmemcheck_pte_lookup(address);
> > if (!pte)
> > return false;
> >
> > WARN_ON_ONCE(in_nmi());
> >
> > if (error_code & 2)
> > ...

That code seems to assume NMI context cannot fault; this is false since
a while back (v3.9 or thereabouts).

> > [ 10.920757] [<ffffffff810452c1>] kmemcheck_fault+0xb1/0xc0
> > [ 10.920760] [<ffffffff814d262b>] __do_page_fault+0x39b/0x4c0
> > [ 10.920763] [<ffffffff814d2829>] do_page_fault+0x9/0x10
> > [ 10.920765] [<ffffffff814cf222>] page_fault+0x22/0x30
> > [ 10.920774] [<ffffffff8101eb02>] intel_pmu_handle_irq+0x142/0x3a0
> > [ 10.920777] [<ffffffff814d0655>] perf_event_nmi_handler+0x35/0x60
> > [ 10.920779] [<ffffffff814cfe83>] nmi_handle+0x63/0x150
> > [ 10.920782] [<ffffffff814cffd3>] default_do_nmi+0x63/0x290
> > [ 10.920784] [<ffffffff814d02a8>] do_nmi+0xa8/0xe0
> > [ 10.920786] [<ffffffff814cf527>] end_repeat_nmi+0x1e/0x2e

And this does indeed show a fault from NMI context; which is totally
expected.

kmemcheck needs to be fixed; but I've no clue how any of that works.

2014-02-26 10:14:45

by Vegard Nossum

[permalink] [raw]
Subject: Re: mm: OS boot failed when set command-line kmemcheck=1

On 26 February 2014 09:43, Peter Zijlstra <[email protected]> wrote:
> On Wed, Feb 19, 2014 at 02:24:41PM -0800, David Rientjes wrote:
>> On Wed, 19 Feb 2014, Xishi Qiu wrote:
>>
>> > Here is a warning, I don't whether it is relative to my hardware.
>> > If set "kmemcheck=1 nowatchdog", it can boot.
>> >
>> > code:
>> > ...
>> > pte = kmemcheck_pte_lookup(address);
>> > if (!pte)
>> > return false;
>> >
>> > WARN_ON_ONCE(in_nmi());
>> >
>> > if (error_code & 2)
>> > ...
>
> That code seems to assume NMI context cannot fault; this is false since
> a while back (v3.9 or thereabouts).
>
>> > [ 10.920757] [<ffffffff810452c1>] kmemcheck_fault+0xb1/0xc0
>> > [ 10.920760] [<ffffffff814d262b>] __do_page_fault+0x39b/0x4c0
>> > [ 10.920763] [<ffffffff814d2829>] do_page_fault+0x9/0x10
>> > [ 10.920765] [<ffffffff814cf222>] page_fault+0x22/0x30
>> > [ 10.920774] [<ffffffff8101eb02>] intel_pmu_handle_irq+0x142/0x3a0
>> > [ 10.920777] [<ffffffff814d0655>] perf_event_nmi_handler+0x35/0x60
>> > [ 10.920779] [<ffffffff814cfe83>] nmi_handle+0x63/0x150
>> > [ 10.920782] [<ffffffff814cffd3>] default_do_nmi+0x63/0x290
>> > [ 10.920784] [<ffffffff814d02a8>] do_nmi+0xa8/0xe0
>> > [ 10.920786] [<ffffffff814cf527>] end_repeat_nmi+0x1e/0x2e
>
> And this does indeed show a fault from NMI context; which is totally
> expected.
>
> kmemcheck needs to be fixed; but I've no clue how any of that works.

IIRC the reason we don't support page faults in NMI context is that we
may already be handling an existing fault (or trap) when the NMI hits.
So that would mess up kmemcheck's working state. I don't really see
that anything has changed in this respect lately, so it could always
have been broken.

I think the way we dealt with this before was just to make sure than
NMI handlers don't access any kmemcheck-tracked memory (i.e. to make
sure that all memory touched by NMI handlers has been marked NOTRACK).
And the purpose of this warning is just to tell us that something
inside an NMI triggered a page fault (in this specific case, it seems
to be intel_pmu_handle_irq).

I guess there are two ways forward:

- create a stack of things that kmemcheck is working on, so that we
handle recursive page faults
- try to figure out why intel_pmu_handle_irq() faults and add a
(kmemcheck-specific) workaround for it

Incidentally, do you remember what exactly changed wrt page faults in
NMI context?


Vegard

2014-02-26 10:30:45

by Peter Zijlstra

[permalink] [raw]
Subject: Re: mm: OS boot failed when set command-line kmemcheck=1

On Wed, Feb 26, 2014 at 11:14:41AM +0100, Vegard Nossum wrote:
> On 26 February 2014 09:43, Peter Zijlstra <[email protected]> wrote:
> > On Wed, Feb 19, 2014 at 02:24:41PM -0800, David Rientjes wrote:
> >> On Wed, 19 Feb 2014, Xishi Qiu wrote:
> >>
> >> > Here is a warning, I don't whether it is relative to my hardware.
> >> > If set "kmemcheck=1 nowatchdog", it can boot.
> >> >
> >> > code:
> >> > ...
> >> > pte = kmemcheck_pte_lookup(address);
> >> > if (!pte)
> >> > return false;
> >> >
> >> > WARN_ON_ONCE(in_nmi());
> >> >
> >> > if (error_code & 2)
> >> > ...
> >
> > That code seems to assume NMI context cannot fault; this is false since
> > a while back (v3.9 or thereabouts).
> >
> >> > [ 10.920757] [<ffffffff810452c1>] kmemcheck_fault+0xb1/0xc0
> >> > [ 10.920760] [<ffffffff814d262b>] __do_page_fault+0x39b/0x4c0
> >> > [ 10.920763] [<ffffffff814d2829>] do_page_fault+0x9/0x10
> >> > [ 10.920765] [<ffffffff814cf222>] page_fault+0x22/0x30
> >> > [ 10.920774] [<ffffffff8101eb02>] intel_pmu_handle_irq+0x142/0x3a0
> >> > [ 10.920777] [<ffffffff814d0655>] perf_event_nmi_handler+0x35/0x60
> >> > [ 10.920779] [<ffffffff814cfe83>] nmi_handle+0x63/0x150
> >> > [ 10.920782] [<ffffffff814cffd3>] default_do_nmi+0x63/0x290
> >> > [ 10.920784] [<ffffffff814d02a8>] do_nmi+0xa8/0xe0
> >> > [ 10.920786] [<ffffffff814cf527>] end_repeat_nmi+0x1e/0x2e
> >
> > And this does indeed show a fault from NMI context; which is totally
> > expected.
> >
> > kmemcheck needs to be fixed; but I've no clue how any of that works.
>
> IIRC the reason we don't support page faults in NMI context is that we
> may already be handling an existing fault (or trap) when the NMI hits.
> So that would mess up kmemcheck's working state. I don't really see
> that anything has changed in this respect lately, so it could always
> have been broken.
>
> I think the way we dealt with this before was just to make sure than
> NMI handlers don't access any kmemcheck-tracked memory (i.e. to make
> sure that all memory touched by NMI handlers has been marked NOTRACK).
> And the purpose of this warning is just to tell us that something
> inside an NMI triggered a page fault (in this specific case, it seems
> to be intel_pmu_handle_irq).
>
> I guess there are two ways forward:
>
> - create a stack of things that kmemcheck is working on, so that we
> handle recursive page faults

That's what perf and ftrace do. We keep a 4 layer stack using things
like:

static inline int get_recursion_context(int *recursion)
{
int rctx;

if (in_nmi())
rctx = 3;
else if (in_irq())
rctx = 2;
else if (in_softirq())
rctx = 1;
else
rctx = 0;

if (recursion[rctx])
return -1;

recursion[rctx]++;
barrier();

return rctx;
}

> - try to figure out why intel_pmu_handle_irq() faults and add a
> (kmemcheck-specific) workaround for it

Well, that's easy, we access user memory, which might or might not be
there.

We do this for a number of reasons; one is to read the code and decode
the current basic block to find the previous instruction; see
intel_pmu_pebs_fixup_ip() another is to try and walk the userspace
framepointers, see perf_callchain_user().

In all cases we use 'atomic' accesses which return short copies in case
of failure; we take the fault handler exception path, and we abort the
operation.

> Incidentally, do you remember what exactly changed wrt page faults in
> NMI context?

Sure; commit 3f3c8b8c4b2a34776c3470142a7c8baafcda6eb0 and a fair number
of 'fixes', in particular: 7fbb98c5cb07563d3ee08714073a8e5452a96be2.

These patches made it possible to take faults from NMI context.
Previously this was not possible because we return from the fault using
IRET and IRET unconditionally re-enables NMIs, which is a bit of a
problem when you're still running the NMI handler.