Fei has reported that KASAN triggers during apply_alternatives() on
5-level paging machine:
BUG: KASAN: out-of-bounds in rcu_is_watching (./arch/x86/include/asm/atomic.h:23 ./include/linux/atomic/atomic-arch-fallback.h:444 ./include/linux/context_tracking.h:122 kernel/rcu/tree.c:699)
Read of size 4 at addr ff110003ee6419a0 by task swapper/0/0
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.6.0-rc5 #12
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
Call Trace:
<TASK>
dump_stack_lvl (lib/dump_stack.c:107)
print_report (mm/kasan/report.c:365 mm/kasan/report.c:475)
? __phys_addr (arch/x86/mm/physaddr.h:7 arch/x86/mm/physaddr.c:28)
? kasan_addr_to_slab (./include/linux/mm.h:1265 (discriminator 1) mm/kasan/../slab.h:213 (discriminator 1) mm/kasan/common.c:36 (discriminator 1))
kasan_report (mm/kasan/report.c:590)
? rcu_is_watching (./arch/x86/include/asm/atomic.h:23 ./include/linux/atomic/atomic-arch-fallback.h:444 ./include/linux/context_tracking.h:122 kernel/rcu/tree.c:699)
? rcu_is_watching (./arch/x86/include/asm/atomic.h:23 ./include/linux/atomic/atomic-arch-fallback.h:444 ./include/linux/context_tracking.h:122 kernel/rcu/tree.c:699)
? apply_alternatives (arch/x86/kernel/alternative.c:415 (discriminator 1))
__asan_load4 (mm/kasan/generic.c:259)
rcu_is_watching (./arch/x86/include/asm/atomic.h:23 ./include/linux/atomic/atomic-arch-fallback.h:444 ./include/linux/context_tracking.h:122 kernel/rcu/tree.c:699)
? text_poke_early (./arch/x86/include/asm/irqflags.h:42 ./arch/x86/include/asm/irqflags.h:77 ./arch/x86/include/asm/irqflags.h:135 arch/x86/kernel/alternative.c:1675)
trace_hardirqs_on (./include/trace/events/preemptirq.h:40 (discriminator 2) ./include/trace/events/preemptirq.h:40 (discriminator 2) kernel/trace/trace_preemptirq.c:56 (discriminator 2))
? __asan_load4 (./arch/x86/include/asm/cpufeature.h:171 mm/kasan/kasan.h:306 mm/kasan/generic.c:175 mm/kasan/generic.c:259)
text_poke_early (./arch/x86/include/asm/irqflags.h:42 ./arch/x86/include/asm/irqflags.h:77 ./arch/x86/include/asm/irqflags.h:135 arch/x86/kernel/alternative.c:1675)
apply_alternatives (arch/x86/kernel/alternative.c:415 (discriminator 1))
? __asan_load4 (./arch/x86/include/asm/cpufeature.h:171 mm/kasan/kasan.h:306 mm/kasan/generic.c:175 mm/kasan/generic.c:259)
? __pfx_apply_alternatives (arch/x86/kernel/alternative.c:400)
? __pfx_apply_returns (arch/x86/kernel/alternative.c:720)
? __this_cpu_preempt_check (lib/smp_processor_id.c:67)
? _sub_I_65535_1 (init/main.c:1573)
? int3_selftest_ip (arch/x86/kernel/alternative.c:1496)
? __pfx_int3_selftest (arch/x86/kernel/alternative.c:1496)
? lockdep_hardirqs_on (kernel/locking/lockdep.c:4422)
? fpu__init_cpu_generic (./arch/x86/include/asm/irqflags.h:42 ./arch/x86/include/asm/irqflags.h:77 ./arch/x86/include/asm/irqflags.h:135 ./arch/x86/include/asm/tlbflush.h:47 arch/x86/kernel/fpu/init.c:30)
alternative_instructions (arch/x86/kernel/alternative.c:1618)
arch_cpu_finalize_init (arch/x86/kernel/cpu/common.c:2404)
start_kernel (init/main.c:1037)
x86_64_start_reservations (arch/x86/kernel/head64.c:544)
x86_64_start_kernel (arch/x86/kernel/head64.c:486 (discriminator 5))
secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:433)
</TASK>
The buggy address belongs to the physical page:
page:(____ptrval____) refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3ee641
flags: 0x20000000004000(reserved|node=0|zone=2)
page_type: 0xffffffff()
raw: 0020000000004000 ffd400000fb99048 ffd400000fb99048 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ff110003ee641880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ff110003ee641900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ff110003ee641980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
^
ff110003ee641a00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ff110003ee641a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
On machines with 5-level paging, cpu_feature_enabled(X86_FEATURE_LA57)
got patched. It includes KASAN code, where KASAN_SHADOW_START depends on
__VIRTUAL_MASK_SHIFT, which is defined with the cpu_feature_enabled().
KASAN gets confused when apply_alternatives() patches the
KASAN_SHADOW_START users. A test patch that makes KASAN_SHADOW_START
static, by replacing __VIRTUAL_MASK_SHIFT with 56, fixes the issue.
Disable KASAN while kernel patches alternatives.
Signed-off-by: Kirill A. Shutemov <[email protected]>
Reported-by: Fei Yang <[email protected]>
Fixes: 6657fca06e3f ("x86/mm: Allow to boot without LA57 if CONFIG_X86_5LEVEL=y")
Cc: [email protected]
---
v2:
- Move kasan_disable/_enable_current() to cover whole loop, not only
text_poke_early();
- Adjust commit message.
---
arch/x86/kernel/alternative.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 517ee01503be..b4cc4d7c0825 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -403,6 +403,17 @@ void __init_or_module noinline apply_alternatives(struct alt_instr *start,
u8 insn_buff[MAX_PATCH_LEN];
DPRINTK(ALT, "alt table %px, -> %px", start, end);
+
+ /*
+ * In the case CONFIG_X86_5LEVEL=y, KASAN_SHADOW_START is defined using
+ * cpu_feature_enabled(X86_FEATURE_LA57) and is therefore patched here.
+ * During the process, KASAN becomes confused and triggers
+ * a false-positive out-of-bound report.
+ *
+ * Disable KASAN until the patching is complete.
+ */
+ kasan_disable_current();
+
/*
* The scan order should be from start to end. A later scanned
* alternative code can overwrite previously scanned alternative code.
@@ -452,6 +463,8 @@ void __init_or_module noinline apply_alternatives(struct alt_instr *start,
text_poke_early(instr, insn_buff, insn_buff_sz);
}
+
+ kasan_enable_current();
}
static inline bool is_jcc32(struct insn *insn)
--
2.41.0
On Wed, Oct 11, 2023 at 09:58:49AM +0300, Kirill A. Shutemov wrote:
> Fei has reported that KASAN triggers during apply_alternatives() on
> 5-level paging machine:
>
Urgh @ KASAN splat, can't we summarize that?
>
> On machines with 5-level paging, cpu_feature_enabled(X86_FEATURE_LA57)
> got patched. It includes KASAN code, where KASAN_SHADOW_START depends on
> __VIRTUAL_MASK_SHIFT, which is defined with the cpu_feature_enabled().
>
> KASAN gets confused when apply_alternatives() patches the
> KASAN_SHADOW_START users. A test patch that makes KASAN_SHADOW_START
> static, by replacing __VIRTUAL_MASK_SHIFT with 56, fixes the issue.
>
> Disable KASAN while kernel patches alternatives.
>
> Signed-off-by: Kirill A. Shutemov <[email protected]>
> Reported-by: Fei Yang <[email protected]>
> Fixes: 6657fca06e3f ("x86/mm: Allow to boot without LA57 if CONFIG_X86_5LEVEL=y")
> Cc: [email protected]
> ---
>
> v2:
> - Move kasan_disable/_enable_current() to cover whole loop, not only
> text_poke_early();
> - Adjust commit message.
>
> ---
> arch/x86/kernel/alternative.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> index 517ee01503be..b4cc4d7c0825 100644
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -403,6 +403,17 @@ void __init_or_module noinline apply_alternatives(struct alt_instr *start,
> u8 insn_buff[MAX_PATCH_LEN];
>
> DPRINTK(ALT, "alt table %px, -> %px", start, end);
> +
> + /*
> + * In the case CONFIG_X86_5LEVEL=y, KASAN_SHADOW_START is defined using
> + * cpu_feature_enabled(X86_FEATURE_LA57) and is therefore patched here.
> + * During the process, KASAN becomes confused and triggers
because of partial LA57 convertion ..
> + * a false-positive out-of-bound report.
> + *
> + * Disable KASAN until the patching is complete.
> + */
> + kasan_disable_current();
> +
> /*
> * The scan order should be from start to end. A later scanned
> * alternative code can overwrite previously scanned alternative code.
> @@ -452,6 +463,8 @@ void __init_or_module noinline apply_alternatives(struct alt_instr *start,
>
> text_poke_early(instr, insn_buff, insn_buff_sz);
> }
> +
> + kasan_enable_current();
> }
Other than that, ACK.
* Peter Zijlstra <[email protected]> wrote:
> > DPRINTK(ALT, "alt table %px, -> %px", start, end);
> > +
> > + /*
> > + * In the case CONFIG_X86_5LEVEL=y, KASAN_SHADOW_START is defined using
> > + * cpu_feature_enabled(X86_FEATURE_LA57) and is therefore patched here.
> > + * During the process, KASAN becomes confused and triggers
>
> because of partial LA57 convertion ..
Not all LA57 related sites are patched yet at this point, and KASAN sees
a weird & broken mixture of LA48 and LA57 runtime semantics, right?
Ie. as far as KASAN is concerned, the LA48 -> LA57 behavioral switchover
must be atomic, but during the kernel code patching process it isn't.
Thanks,
Ingo
On Wed, Oct 11, 2023 at 10:11:46AM +0200, Ingo Molnar wrote:
>
> * Peter Zijlstra <[email protected]> wrote:
>
> > > DPRINTK(ALT, "alt table %px, -> %px", start, end);
> > > +
> > > + /*
> > > + * In the case CONFIG_X86_5LEVEL=y, KASAN_SHADOW_START is defined using
> > > + * cpu_feature_enabled(X86_FEATURE_LA57) and is therefore patched here.
> > > + * During the process, KASAN becomes confused and triggers
> >
> > because of partial LA57 convertion ..
>
> Not all LA57 related sites are patched yet at this point, and KASAN sees
> a weird & broken mixture of LA48 and LA57 runtime semantics, right?
>
> Ie. as far as KASAN is concerned, the LA48 -> LA57 behavioral switchover
> must be atomic, but during the kernel code patching process it isn't.
Yep, half-way through the patching it observes inconsistencies and goes
WTF :-)
On Wed, Oct 11, 2023 at 09:46:16AM +0200, Peter Zijlstra wrote:
> On Wed, Oct 11, 2023 at 09:58:49AM +0300, Kirill A. Shutemov wrote:
> > Fei has reported that KASAN triggers during apply_alternatives() on
> > 5-level paging machine:
> >
>
> Urgh @ KASAN splat, can't we summarize that?
What about this?
BUG: KASAN: out-of-bounds in rcu_is_watching
Read of size 4 at addr ff110003ee6419a0 by task swapper/0/0
...
__asan_load4
rcu_is_watching
? text_poke_early
trace_hardirqs_on
? __asan_load4
text_poke_early
apply_alternatives
...
Is it enough details or I overdid summarization?
> > diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> > index 517ee01503be..b4cc4d7c0825 100644
> > --- a/arch/x86/kernel/alternative.c
> > +++ b/arch/x86/kernel/alternative.c
> > @@ -403,6 +403,17 @@ void __init_or_module noinline apply_alternatives(struct alt_instr *start,
> > u8 insn_buff[MAX_PATCH_LEN];
> >
> > DPRINTK(ALT, "alt table %px, -> %px", start, end);
> > +
> > + /*
> > + * In the case CONFIG_X86_5LEVEL=y, KASAN_SHADOW_START is defined using
> > + * cpu_feature_enabled(X86_FEATURE_LA57) and is therefore patched here.
> > + * During the process, KASAN becomes confused and triggers
>
> because of partial LA57 convertion ..
>
> > + * a false-positive out-of-bound report.
> > + *
> > + * Disable KASAN until the patching is complete.
> > + */
> > + kasan_disable_current();
> > +
> > /*
/*
* In the case CONFIG_X86_5LEVEL=y, KASAN_SHADOW_START is defined using
* cpu_feature_enabled(X86_FEATURE_LA57) and is therefore patched here.
* During the process, KASAN becomes confused seeing partial LA57
* conversion and triggers a false-positive out-of-bound report.
*
* Disable KASAN until the patching is complete.
*/
Looks good?
If yes, I will submit v3 with your Ack.
--
Kiryl Shutsemau / Kirill A. Shutemov
* Kirill A. Shutemov <[email protected]> wrote:
> On Wed, Oct 11, 2023 at 09:46:16AM +0200, Peter Zijlstra wrote:
> > On Wed, Oct 11, 2023 at 09:58:49AM +0300, Kirill A. Shutemov wrote:
> > > Fei has reported that KASAN triggers during apply_alternatives() on
> > > 5-level paging machine:
> > >
> >
> > Urgh @ KASAN splat, can't we summarize that?
>
> What about this?
>
> BUG: KASAN: out-of-bounds in rcu_is_watching
> Read of size 4 at addr ff110003ee6419a0 by task swapper/0/0
> ...
> __asan_load4
> rcu_is_watching
> ? text_poke_early
> trace_hardirqs_on
> ? __asan_load4
> text_poke_early
> apply_alternatives
> ...
>
> Is it enough details or I overdid summarization?
No, that's perfect IMO. I'd even leave out the unreliable '?' entries:
> BUG: KASAN: out-of-bounds in rcu_is_watching
> Read of size 4 at addr ff110003ee6419a0 by task swapper/0/0
> ...
> __asan_load4
> rcu_is_watching
> trace_hardirqs_on
> text_poke_early
> apply_alternatives
> ...
... or so.
> /*
> * In the case CONFIG_X86_5LEVEL=y, KASAN_SHADOW_START is defined using
> * cpu_feature_enabled(X86_FEATURE_LA57) and is therefore patched here.
> * During the process, KASAN becomes confused seeing partial LA57
> * conversion and triggers a false-positive out-of-bound report.
> *
> * Disable KASAN until the patching is complete.
> */
>
> Looks good?
LGTM.
Thanks,
Ingo