2017-06-01 14:54:47

by Andrey Ryabinin

[permalink] [raw]
Subject: Re: KASAN vs. boot-time switching between 4- and 5-level paging

On 05/29/2017 03:46 PM, Andrey Ryabinin wrote:
> On 05/29/2017 02:45 PM, Andrey Ryabinin wrote:
>>>>>> Looks like KASAN will be a problem for boot-time paging mode switching.
>>>>>> It wants to know CONFIG_KASAN_SHADOW_OFFSET at compile-time to pass to
>>>>>> gcc -fasan-shadow-offset=. But this value varies between paging modes...
>>>>>>
>>>>>> I don't see how to solve it. Folks, any ideas?
>>>>>
>>>>> +kasan-dev
>>>>>
>>>>> I wonder if we can use the same offset for both modes. If we use
>>>>> 0xFFDFFC0000000000 as start of shadow for 5 levels, then the same
>>>>> offset that we use for 4 levels (0xdffffc0000000000) will also work
>>>>> for 5 levels. Namely, ending of 5 level shadow will overlap with 4
>>>>> level mapping (both end at 0xfffffbffffffffff), but 5 level mapping
>>>>> extends towards lower addresses. The current 5 level start of shadow
>>>>> is actually close -- 0xffd8000000000000 and it seems that the required
>>>>> space after it is unused at the moment (at least looking at mm.txt).
>>>>> So just try to move it to 0xFFDFFC0000000000?
>>>>>
>>>>
>>>> Yeah, this should work, but note that 0xFFDFFC0000000000 is not PGDIR aligned address. Our init code
>>>> assumes that kasan shadow stars and ends on the PGDIR aligned address.
>>>> Fortunately this is fixable, we'd need two more pages for page tables to map unaligned start/end
>>>> of the shadow.
>>>
>>> I think we can extend the shadow backwards (to the current address),
>>> provided that it does not affect shadow offset that we pass to
>>> compiler.
>>
>> I thought about this. We can round down shadow start to 0xffdf000000000000, but we can't
>> round up shadow end, because in that case shadow would end at 0xffffffffffffffff.
>> So we still need at least one more page to cover unaligned end.
>
> Actually, I'm wrong here. I assumed that we would need an additional page to store p4d entries,
> but in fact we don't need it, as such page should already exist. It's the same last pgd where kernel image
> is mapped.
>


Something like bellow might work. It's just a proposal to demonstrate the idea, so some code might look ugly.
And it's only build-tested.

Based on top of: git://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git la57/integration


---
arch/x86/Kconfig | 1 -
arch/x86/mm/kasan_init_64.c | 74 ++++++++++++++++++++++++++++++++-------------
2 files changed, 53 insertions(+), 22 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 11bd0498f64c..3456f2fdda52 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -291,7 +291,6 @@ config ARCH_SUPPORTS_DEBUG_PAGEALLOC
config KASAN_SHADOW_OFFSET
hex
depends on KASAN
- default 0xdff8000000000000 if X86_5LEVEL
default 0xdffffc0000000000

config HAVE_INTEL_TXT
diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c
index 88215ac16b24..d79a7ea83d05 100644
--- a/arch/x86/mm/kasan_init_64.c
+++ b/arch/x86/mm/kasan_init_64.c
@@ -15,6 +15,10 @@
extern pgd_t early_top_pgt[PTRS_PER_PGD];
extern struct range pfn_mapped[E820_MAX_ENTRIES];

+#if CONFIG_PGTABLE_LEVELS == 5
+p4d_t tmp_p4d_table[PTRS_PER_P4D] __initdata __aligned(PAGE_SIZE);
+#endif
+
static int __init map_range(struct range *range)
{
unsigned long start;
@@ -35,8 +39,9 @@ static void __init clear_pgds(unsigned long start,
unsigned long end)
{
pgd_t *pgd;
+ unsigned long pgd_end = end & PGDIR_MASK;

- for (; start < end; start += PGDIR_SIZE) {
+ for (; start < pgd_end; start += PGDIR_SIZE) {
pgd = pgd_offset_k(start);
/*
* With folded p4d, pgd_clear() is nop, use p4d_clear()
@@ -47,29 +52,50 @@ static void __init clear_pgds(unsigned long start,
else
pgd_clear(pgd);
}
+
+ pgd = pgd_offset_k(start);
+ for (; start < end; start += P4D_SIZE)
+ p4d_clear(p4d_offset(pgd, start));
+}
+
+static void __init kasan_early_p4d_populate(pgd_t *pgd,
+ unsigned long addr,
+ unsigned long end)
+{
+ p4d_t *p4d;
+ unsigned long next;
+
+ if (pgd_none(*pgd))
+ set_pgd(pgd, __pgd(_KERNPG_TABLE | __pa_nodebug(kasan_zero_p4d)));
+
+ /* early p4d_offset()
+ * TODO: we need helpers for this shit
+ */
+ if (CONFIG_PGTABLE_LEVELS == 5)
+ p4d = ((p4d_t*)((__pa_nodebug(pgd->pgd) & PTE_PFN_MASK) + __START_KERNEL_map))
+ + p4d_index(addr);
+ else
+ p4d = (p4d_t*)pgd;
+ do {
+ next = p4d_addr_end(addr, end);
+
+ if (p4d_none(*p4d))
+ set_p4d(p4d, __p4d(_KERNPG_TABLE |
+ __pa_nodebug(kasan_zero_pud)));
+ } while (p4d++, addr = next, addr != end && p4d_none(*p4d));
}

static void __init kasan_map_early_shadow(pgd_t *pgd)
{
- int i;
- unsigned long start = KASAN_SHADOW_START;
+ unsigned long addr = KASAN_SHADOW_START & PGDIR_MASK;
unsigned long end = KASAN_SHADOW_END;
+ unsigned long next;

- for (i = pgd_index(start); start < end; i++) {
- switch (CONFIG_PGTABLE_LEVELS) {
- case 4:
- pgd[i] = __pgd(__pa_nodebug(kasan_zero_pud) |
- _KERNPG_TABLE);
- break;
- case 5:
- pgd[i] = __pgd(__pa_nodebug(kasan_zero_p4d) |
- _KERNPG_TABLE);
- break;
- default:
- BUILD_BUG();
- }
- start += PGDIR_SIZE;
- }
+ pgd = pgd + pgd_index(addr);
+ do {
+ next = pgd_addr_end(addr, end);
+ kasan_early_p4d_populate(pgd, addr, next);
+ } while (pgd++, addr = next, addr != end);
}

#ifdef CONFIG_KASAN_INLINE
@@ -120,14 +146,20 @@ void __init kasan_init(void)
#ifdef CONFIG_KASAN_INLINE
register_die_notifier(&kasan_die_notifier);
#endif
-
memcpy(early_top_pgt, init_top_pgt, sizeof(early_top_pgt));
+#if CONFIG_PGTABLE_LEVELS == 5
+ memcpy(tmp_p4d_table, (void*)pgd_page_vaddr(*pgd_offset_k(KASAN_SHADOW_END)),
+ sizeof(tmp_p4d_table));
+ set_pgd(&early_top_pgt[pgd_index(KASAN_SHADOW_END)],
+ __pgd(__pa(tmp_p4d_table) | _KERNPG_TABLE));
+#endif
+
load_cr3(early_top_pgt);
__flush_tlb_all();

- clear_pgds(KASAN_SHADOW_START, KASAN_SHADOW_END);
+ clear_pgds(KASAN_SHADOW_START & PGDIR_MASK, KASAN_SHADOW_END);

- kasan_populate_zero_shadow((void *)KASAN_SHADOW_START,
+ kasan_populate_zero_shadow((void *)(KASAN_SHADOW_START & PGDIR_MASK),
kasan_mem_to_shadow((void *)PAGE_OFFSET));

for (i = 0; i < E820_MAX_ENTRIES; i++) {
--
2.13.0


2017-07-10 12:33:56

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: KASAN vs. boot-time switching between 4- and 5-level paging

On Thu, Jun 01, 2017 at 05:56:30PM +0300, Andrey Ryabinin wrote:
> On 05/29/2017 03:46 PM, Andrey Ryabinin wrote:
> > On 05/29/2017 02:45 PM, Andrey Ryabinin wrote:
> >>>>>> Looks like KASAN will be a problem for boot-time paging mode switching.
> >>>>>> It wants to know CONFIG_KASAN_SHADOW_OFFSET at compile-time to pass to
> >>>>>> gcc -fasan-shadow-offset=. But this value varies between paging modes...
> >>>>>>
> >>>>>> I don't see how to solve it. Folks, any ideas?
> >>>>>
> >>>>> +kasan-dev
> >>>>>
> >>>>> I wonder if we can use the same offset for both modes. If we use
> >>>>> 0xFFDFFC0000000000 as start of shadow for 5 levels, then the same
> >>>>> offset that we use for 4 levels (0xdffffc0000000000) will also work
> >>>>> for 5 levels. Namely, ending of 5 level shadow will overlap with 4
> >>>>> level mapping (both end at 0xfffffbffffffffff), but 5 level mapping
> >>>>> extends towards lower addresses. The current 5 level start of shadow
> >>>>> is actually close -- 0xffd8000000000000 and it seems that the required
> >>>>> space after it is unused at the moment (at least looking at mm.txt).
> >>>>> So just try to move it to 0xFFDFFC0000000000?
> >>>>>
> >>>>
> >>>> Yeah, this should work, but note that 0xFFDFFC0000000000 is not PGDIR aligned address. Our init code
> >>>> assumes that kasan shadow stars and ends on the PGDIR aligned address.
> >>>> Fortunately this is fixable, we'd need two more pages for page tables to map unaligned start/end
> >>>> of the shadow.
> >>>
> >>> I think we can extend the shadow backwards (to the current address),
> >>> provided that it does not affect shadow offset that we pass to
> >>> compiler.
> >>
> >> I thought about this. We can round down shadow start to 0xffdf000000000000, but we can't
> >> round up shadow end, because in that case shadow would end at 0xffffffffffffffff.
> >> So we still need at least one more page to cover unaligned end.
> >
> > Actually, I'm wrong here. I assumed that we would need an additional page to store p4d entries,
> > but in fact we don't need it, as such page should already exist. It's the same last pgd where kernel image
> > is mapped.
> >
>
>
> Something like bellow might work. It's just a proposal to demonstrate the idea, so some code might look ugly.
> And it's only build-tested.

[Sorry for loong delay.]

The patch works for me for legacy boot. But it breaks EFI boot with
5-level paging. And I struggle to understand why.

What I see is many page faults at mm/kasan/kasan.c:758 --
"DEFINE_ASAN_LOAD_STORE(4)". Handling one of them I get double-fault at
arch/x86/kernel/head_64.S:298 -- "pushq %r14", which ends up with triple
fault.

Any ideas?

If you want to play with this by yourself, qemu supports la57 -- use
-cpu "qemu64,+la57".

--
Kirill A. Shutemov

2017-07-10 12:43:45

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: KASAN vs. boot-time switching between 4- and 5-level paging

On Mon, Jul 10, 2017 at 2:33 PM, Kirill A. Shutemov
<[email protected]> wrote:
> On Thu, Jun 01, 2017 at 05:56:30PM +0300, Andrey Ryabinin wrote:
>> On 05/29/2017 03:46 PM, Andrey Ryabinin wrote:
>> > On 05/29/2017 02:45 PM, Andrey Ryabinin wrote:
>> >>>>>> Looks like KASAN will be a problem for boot-time paging mode switching.
>> >>>>>> It wants to know CONFIG_KASAN_SHADOW_OFFSET at compile-time to pass to
>> >>>>>> gcc -fasan-shadow-offset=. But this value varies between paging modes...
>> >>>>>>
>> >>>>>> I don't see how to solve it. Folks, any ideas?
>> >>>>>
>> >>>>> +kasan-dev
>> >>>>>
>> >>>>> I wonder if we can use the same offset for both modes. If we use
>> >>>>> 0xFFDFFC0000000000 as start of shadow for 5 levels, then the same
>> >>>>> offset that we use for 4 levels (0xdffffc0000000000) will also work
>> >>>>> for 5 levels. Namely, ending of 5 level shadow will overlap with 4
>> >>>>> level mapping (both end at 0xfffffbffffffffff), but 5 level mapping
>> >>>>> extends towards lower addresses. The current 5 level start of shadow
>> >>>>> is actually close -- 0xffd8000000000000 and it seems that the required
>> >>>>> space after it is unused at the moment (at least looking at mm.txt).
>> >>>>> So just try to move it to 0xFFDFFC0000000000?
>> >>>>>
>> >>>>
>> >>>> Yeah, this should work, but note that 0xFFDFFC0000000000 is not PGDIR aligned address. Our init code
>> >>>> assumes that kasan shadow stars and ends on the PGDIR aligned address.
>> >>>> Fortunately this is fixable, we'd need two more pages for page tables to map unaligned start/end
>> >>>> of the shadow.
>> >>>
>> >>> I think we can extend the shadow backwards (to the current address),
>> >>> provided that it does not affect shadow offset that we pass to
>> >>> compiler.
>> >>
>> >> I thought about this. We can round down shadow start to 0xffdf000000000000, but we can't
>> >> round up shadow end, because in that case shadow would end at 0xffffffffffffffff.
>> >> So we still need at least one more page to cover unaligned end.
>> >
>> > Actually, I'm wrong here. I assumed that we would need an additional page to store p4d entries,
>> > but in fact we don't need it, as such page should already exist. It's the same last pgd where kernel image
>> > is mapped.
>> >
>>
>>
>> Something like bellow might work. It's just a proposal to demonstrate the idea, so some code might look ugly.
>> And it's only build-tested.
>
> [Sorry for loong delay.]
>
> The patch works for me for legacy boot. But it breaks EFI boot with
> 5-level paging. And I struggle to understand why.
>
> What I see is many page faults at mm/kasan/kasan.c:758 --
> "DEFINE_ASAN_LOAD_STORE(4)". Handling one of them I get double-fault at
> arch/x86/kernel/head_64.S:298 -- "pushq %r14", which ends up with triple
> fault.
>
> Any ideas?


Just playing the role of the rubber duck:
- what is the fault address?
- is it within the shadow range?
- was the shadow mapped already?


> If you want to play with this by yourself, qemu supports la57 -- use
> -cpu "qemu64,+la57".
>
> --
> Kirill A. Shutemov

2017-07-10 14:17:21

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: KASAN vs. boot-time switching between 4- and 5-level paging

On Mon, Jul 10, 2017 at 02:43:17PM +0200, Dmitry Vyukov wrote:
> On Mon, Jul 10, 2017 at 2:33 PM, Kirill A. Shutemov
> <[email protected]> wrote:
> > On Thu, Jun 01, 2017 at 05:56:30PM +0300, Andrey Ryabinin wrote:
> >> On 05/29/2017 03:46 PM, Andrey Ryabinin wrote:
> >> > On 05/29/2017 02:45 PM, Andrey Ryabinin wrote:
> >> >>>>>> Looks like KASAN will be a problem for boot-time paging mode switching.
> >> >>>>>> It wants to know CONFIG_KASAN_SHADOW_OFFSET at compile-time to pass to
> >> >>>>>> gcc -fasan-shadow-offset=. But this value varies between paging modes...
> >> >>>>>>
> >> >>>>>> I don't see how to solve it. Folks, any ideas?
> >> >>>>>
> >> >>>>> +kasan-dev
> >> >>>>>
> >> >>>>> I wonder if we can use the same offset for both modes. If we use
> >> >>>>> 0xFFDFFC0000000000 as start of shadow for 5 levels, then the same
> >> >>>>> offset that we use for 4 levels (0xdffffc0000000000) will also work
> >> >>>>> for 5 levels. Namely, ending of 5 level shadow will overlap with 4
> >> >>>>> level mapping (both end at 0xfffffbffffffffff), but 5 level mapping
> >> >>>>> extends towards lower addresses. The current 5 level start of shadow
> >> >>>>> is actually close -- 0xffd8000000000000 and it seems that the required
> >> >>>>> space after it is unused at the moment (at least looking at mm.txt).
> >> >>>>> So just try to move it to 0xFFDFFC0000000000?
> >> >>>>>
> >> >>>>
> >> >>>> Yeah, this should work, but note that 0xFFDFFC0000000000 is not PGDIR aligned address. Our init code
> >> >>>> assumes that kasan shadow stars and ends on the PGDIR aligned address.
> >> >>>> Fortunately this is fixable, we'd need two more pages for page tables to map unaligned start/end
> >> >>>> of the shadow.
> >> >>>
> >> >>> I think we can extend the shadow backwards (to the current address),
> >> >>> provided that it does not affect shadow offset that we pass to
> >> >>> compiler.
> >> >>
> >> >> I thought about this. We can round down shadow start to 0xffdf000000000000, but we can't
> >> >> round up shadow end, because in that case shadow would end at 0xffffffffffffffff.
> >> >> So we still need at least one more page to cover unaligned end.
> >> >
> >> > Actually, I'm wrong here. I assumed that we would need an additional page to store p4d entries,
> >> > but in fact we don't need it, as such page should already exist. It's the same last pgd where kernel image
> >> > is mapped.
> >> >
> >>
> >>
> >> Something like bellow might work. It's just a proposal to demonstrate the idea, so some code might look ugly.
> >> And it's only build-tested.
> >
> > [Sorry for loong delay.]
> >
> > The patch works for me for legacy boot. But it breaks EFI boot with
> > 5-level paging. And I struggle to understand why.
> >
> > What I see is many page faults at mm/kasan/kasan.c:758 --
> > "DEFINE_ASAN_LOAD_STORE(4)". Handling one of them I get double-fault at
> > arch/x86/kernel/head_64.S:298 -- "pushq %r14", which ends up with triple
> > fault.
> >
> > Any ideas?
>
>
> Just playing the role of the rubber duck:
> - what is the fault address?
> - is it within the shadow range?
> - was the shadow mapped already?

I misread trace. The initial fault is at arch/x86/kernel/head_64.S:270,
which is ".endr" in definition of early_idt_handler_array.

The fault address for all three faults is 0xffffffff7ffffff8, which is
outside shadow range. It's just before kernel text mapping.

Codewise, it happens in load_ucode_bsp() -- after kasan_early_init(), but
before kasan_init().

--
Kirill A. Shutemov

2017-07-10 15:56:43

by Andy Lutomirski

[permalink] [raw]
Subject: Re: KASAN vs. boot-time switching between 4- and 5-level paging



> On Jul 10, 2017, at 7:17 AM, Kirill A. Shutemov <[email protected]> wrote:
>
>> On Mon, Jul 10, 2017 at 02:43:17PM +0200, Dmitry Vyukov wrote:
>> On Mon, Jul 10, 2017 at 2:33 PM, Kirill A. Shutemov
>> <[email protected]> wrote:
>>> On Thu, Jun 01, 2017 at 05:56:30PM +0300, Andrey Ryabinin wrote:
>>>>> On 05/29/2017 03:46 PM, Andrey Ryabinin wrote:
>>>>> On 05/29/2017 02:45 PM, Andrey Ryabinin wrote:
>>>>>>>>>> Looks like KASAN will be a problem for boot-time paging mode switching.
>>>>>>>>>> It wants to know CONFIG_KASAN_SHADOW_OFFSET at compile-time to pass to
>>>>>>>>>> gcc -fasan-shadow-offset=. But this value varies between paging modes...
>>>>>>>>>>
>>>>>>>>>> I don't see how to solve it. Folks, any ideas?
>>>>>>>>>
>>>>>>>>> +kasan-dev
>>>>>>>>>
>>>>>>>>> I wonder if we can use the same offset for both modes. If we use
>>>>>>>>> 0xFFDFFC0000000000 as start of shadow for 5 levels, then the same
>>>>>>>>> offset that we use for 4 levels (0xdffffc0000000000) will also work
>>>>>>>>> for 5 levels. Namely, ending of 5 level shadow will overlap with 4
>>>>>>>>> level mapping (both end at 0xfffffbffffffffff), but 5 level mapping
>>>>>>>>> extends towards lower addresses. The current 5 level start of shadow
>>>>>>>>> is actually close -- 0xffd8000000000000 and it seems that the required
>>>>>>>>> space after it is unused at the moment (at least looking at mm.txt).
>>>>>>>>> So just try to move it to 0xFFDFFC0000000000?
>>>>>>>>>
>>>>>>>>
>>>>>>>> Yeah, this should work, but note that 0xFFDFFC0000000000 is not PGDIR aligned address. Our init code
>>>>>>>> assumes that kasan shadow stars and ends on the PGDIR aligned address.
>>>>>>>> Fortunately this is fixable, we'd need two more pages for page tables to map unaligned start/end
>>>>>>>> of the shadow.
>>>>>>>
>>>>>>> I think we can extend the shadow backwards (to the current address),
>>>>>>> provided that it does not affect shadow offset that we pass to
>>>>>>> compiler.
>>>>>>
>>>>>> I thought about this. We can round down shadow start to 0xffdf000000000000, but we can't
>>>>>> round up shadow end, because in that case shadow would end at 0xffffffffffffffff.
>>>>>> So we still need at least one more page to cover unaligned end.
>>>>>
>>>>> Actually, I'm wrong here. I assumed that we would need an additional page to store p4d entries,
>>>>> but in fact we don't need it, as such page should already exist. It's the same last pgd where kernel image
>>>>> is mapped.
>>>>>
>>>>
>>>>
>>>> Something like bellow might work. It's just a proposal to demonstrate the idea, so some code might look ugly.
>>>> And it's only build-tested.
>>>
>>> [Sorry for loong delay.]
>>>
>>> The patch works for me for legacy boot. But it breaks EFI boot with
>>> 5-level paging. And I struggle to understand why.
>>>
>>> What I see is many page faults at mm/kasan/kasan.c:758 --
>>> "DEFINE_ASAN_LOAD_STORE(4)". Handling one of them I get double-fault at
>>> arch/x86/kernel/head_64.S:298 -- "pushq %r14", which ends up with triple
>>> fault.
>>>
>>> Any ideas?
>>
>>
>> Just playing the role of the rubber duck:
>> - what is the fault address?
>> - is it within the shadow range?
>> - was the shadow mapped already?
>
> I misread trace. The initial fault is at arch/x86/kernel/head_64.S:270,
> which is ".endr" in definition of early_idt_handler_array.
>
> The fault address for all three faults is 0xffffffff7ffffff8, which is
> outside shadow range. It's just before kernel text mapping.
>
> Codewise, it happens in load_ucode_bsp() -- after kasan_early_init(), but
> before kasan_init().

My theory is that, in 5 level mode, the early IDT code isn't all mapped in the page tables. This could sometimes be papered over by lazy page table setup, but lazy setup can't handle faults in the page fault code or data structures.

EFI sometimes uses separate page tables, which could contribute.

>
> --
> Kirill A. Shutemov

2017-07-10 16:55:26

by Andrey Ryabinin

[permalink] [raw]
Subject: Re: KASAN vs. boot-time switching between 4- and 5-level paging



On 07/10/2017 03:33 PM, Kirill A. Shutemov wrote:

>
> [Sorry for loong delay.]
>
> The patch works for me for legacy boot. But it breaks EFI boot with
> 5-level paging. And I struggle to understand why.
>
> What I see is many page faults at mm/kasan/kasan.c:758 --
> "DEFINE_ASAN_LOAD_STORE(4)". Handling one of them I get double-fault at
> arch/x86/kernel/head_64.S:298 -- "pushq %r14", which ends up with triple
> fault.
>
> Any ideas?
>
> If you want to play with this by yourself, qemu supports la57 -- use
> -cpu "qemu64,+la57".
>

I'll have a look.

2017-07-10 18:47:10

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: KASAN vs. boot-time switching between 4- and 5-level paging

On Mon, Jul 10, 2017 at 08:56:37AM -0700, Andy Lutomirski wrote:
>
>
> > On Jul 10, 2017, at 7:17 AM, Kirill A. Shutemov <[email protected]> wrote:
> >
> >> On Mon, Jul 10, 2017 at 02:43:17PM +0200, Dmitry Vyukov wrote:
> >> On Mon, Jul 10, 2017 at 2:33 PM, Kirill A. Shutemov
> >> <[email protected]> wrote:
> >>> On Thu, Jun 01, 2017 at 05:56:30PM +0300, Andrey Ryabinin wrote:
> >>>>> On 05/29/2017 03:46 PM, Andrey Ryabinin wrote:
> >>>>> On 05/29/2017 02:45 PM, Andrey Ryabinin wrote:
> >>>>>>>>>> Looks like KASAN will be a problem for boot-time paging mode switching.
> >>>>>>>>>> It wants to know CONFIG_KASAN_SHADOW_OFFSET at compile-time to pass to
> >>>>>>>>>> gcc -fasan-shadow-offset=. But this value varies between paging modes...
> >>>>>>>>>>
> >>>>>>>>>> I don't see how to solve it. Folks, any ideas?
> >>>>>>>>>
> >>>>>>>>> +kasan-dev
> >>>>>>>>>
> >>>>>>>>> I wonder if we can use the same offset for both modes. If we use
> >>>>>>>>> 0xFFDFFC0000000000 as start of shadow for 5 levels, then the same
> >>>>>>>>> offset that we use for 4 levels (0xdffffc0000000000) will also work
> >>>>>>>>> for 5 levels. Namely, ending of 5 level shadow will overlap with 4
> >>>>>>>>> level mapping (both end at 0xfffffbffffffffff), but 5 level mapping
> >>>>>>>>> extends towards lower addresses. The current 5 level start of shadow
> >>>>>>>>> is actually close -- 0xffd8000000000000 and it seems that the required
> >>>>>>>>> space after it is unused at the moment (at least looking at mm.txt).
> >>>>>>>>> So just try to move it to 0xFFDFFC0000000000?
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> Yeah, this should work, but note that 0xFFDFFC0000000000 is not PGDIR aligned address. Our init code
> >>>>>>>> assumes that kasan shadow stars and ends on the PGDIR aligned address.
> >>>>>>>> Fortunately this is fixable, we'd need two more pages for page tables to map unaligned start/end
> >>>>>>>> of the shadow.
> >>>>>>>
> >>>>>>> I think we can extend the shadow backwards (to the current address),
> >>>>>>> provided that it does not affect shadow offset that we pass to
> >>>>>>> compiler.
> >>>>>>
> >>>>>> I thought about this. We can round down shadow start to 0xffdf000000000000, but we can't
> >>>>>> round up shadow end, because in that case shadow would end at 0xffffffffffffffff.
> >>>>>> So we still need at least one more page to cover unaligned end.
> >>>>>
> >>>>> Actually, I'm wrong here. I assumed that we would need an additional page to store p4d entries,
> >>>>> but in fact we don't need it, as such page should already exist. It's the same last pgd where kernel image
> >>>>> is mapped.
> >>>>>
> >>>>
> >>>>
> >>>> Something like bellow might work. It's just a proposal to demonstrate the idea, so some code might look ugly.
> >>>> And it's only build-tested.
> >>>
> >>> [Sorry for loong delay.]
> >>>
> >>> The patch works for me for legacy boot. But it breaks EFI boot with
> >>> 5-level paging. And I struggle to understand why.
> >>>
> >>> What I see is many page faults at mm/kasan/kasan.c:758 --
> >>> "DEFINE_ASAN_LOAD_STORE(4)". Handling one of them I get double-fault at
> >>> arch/x86/kernel/head_64.S:298 -- "pushq %r14", which ends up with triple
> >>> fault.
> >>>
> >>> Any ideas?
> >>
> >>
> >> Just playing the role of the rubber duck:
> >> - what is the fault address?
> >> - is it within the shadow range?
> >> - was the shadow mapped already?
> >
> > I misread trace. The initial fault is at arch/x86/kernel/head_64.S:270,
> > which is ".endr" in definition of early_idt_handler_array.
> >
> > The fault address for all three faults is 0xffffffff7ffffff8, which is
> > outside shadow range. It's just before kernel text mapping.
> >
> > Codewise, it happens in load_ucode_bsp() -- after kasan_early_init(), but
> > before kasan_init().
>
> My theory is that, in 5 level mode, the early IDT code isn't all mapped
> in the page tables. This could sometimes be papered over by lazy page
> table setup, but lazy setup can't handle faults in the page fault code
> or data structures.
>
> EFI sometimes uses separate page tables, which could contribute.

As far as I can see all involved code is within the same page:

(gdb) p/x &x86_64_start_kernel
$1 = 0xffffffff84bad2ae
(gdb) p/x &early_idt_handler_array
$2 = 0xffffffff84bad000
(gdb) p/x &early_idt_handler_common
$3 = 0xffffffff84bad120
(gdb) p/x &early_make_pgtable
$4 = 0xffffffff84bad3b4

--
Kirill A. Shutemov

2017-07-10 20:07:37

by Andy Lutomirski

[permalink] [raw]
Subject: Re: KASAN vs. boot-time switching between 4- and 5-level paging

On Mon, Jul 10, 2017 at 11:47 AM, Kirill A. Shutemov
<[email protected]> wrote:
> On Mon, Jul 10, 2017 at 08:56:37AM -0700, Andy Lutomirski wrote:
>>
>>
>> > On Jul 10, 2017, at 7:17 AM, Kirill A. Shutemov <[email protected]> wrote:
>> >
>> >> On Mon, Jul 10, 2017 at 02:43:17PM +0200, Dmitry Vyukov wrote:
>> >> On Mon, Jul 10, 2017 at 2:33 PM, Kirill A. Shutemov
>> >> <[email protected]> wrote:
>> >>> On Thu, Jun 01, 2017 at 05:56:30PM +0300, Andrey Ryabinin wrote:
>> >>>>> On 05/29/2017 03:46 PM, Andrey Ryabinin wrote:
>> >>>>> On 05/29/2017 02:45 PM, Andrey Ryabinin wrote:
>> >>>>>>>>>> Looks like KASAN will be a problem for boot-time paging mode switching.
>> >>>>>>>>>> It wants to know CONFIG_KASAN_SHADOW_OFFSET at compile-time to pass to
>> >>>>>>>>>> gcc -fasan-shadow-offset=. But this value varies between paging modes...
>> >>>>>>>>>>
>> >>>>>>>>>> I don't see how to solve it. Folks, any ideas?
>> >>>>>>>>>
>> >>>>>>>>> +kasan-dev
>> >>>>>>>>>
>> >>>>>>>>> I wonder if we can use the same offset for both modes. If we use
>> >>>>>>>>> 0xFFDFFC0000000000 as start of shadow for 5 levels, then the same
>> >>>>>>>>> offset that we use for 4 levels (0xdffffc0000000000) will also work
>> >>>>>>>>> for 5 levels. Namely, ending of 5 level shadow will overlap with 4
>> >>>>>>>>> level mapping (both end at 0xfffffbffffffffff), but 5 level mapping
>> >>>>>>>>> extends towards lower addresses. The current 5 level start of shadow
>> >>>>>>>>> is actually close -- 0xffd8000000000000 and it seems that the required
>> >>>>>>>>> space after it is unused at the moment (at least looking at mm.txt).
>> >>>>>>>>> So just try to move it to 0xFFDFFC0000000000?
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>> Yeah, this should work, but note that 0xFFDFFC0000000000 is not PGDIR aligned address. Our init code
>> >>>>>>>> assumes that kasan shadow stars and ends on the PGDIR aligned address.
>> >>>>>>>> Fortunately this is fixable, we'd need two more pages for page tables to map unaligned start/end
>> >>>>>>>> of the shadow.
>> >>>>>>>
>> >>>>>>> I think we can extend the shadow backwards (to the current address),
>> >>>>>>> provided that it does not affect shadow offset that we pass to
>> >>>>>>> compiler.
>> >>>>>>
>> >>>>>> I thought about this. We can round down shadow start to 0xffdf000000000000, but we can't
>> >>>>>> round up shadow end, because in that case shadow would end at 0xffffffffffffffff.
>> >>>>>> So we still need at least one more page to cover unaligned end.
>> >>>>>
>> >>>>> Actually, I'm wrong here. I assumed that we would need an additional page to store p4d entries,
>> >>>>> but in fact we don't need it, as such page should already exist. It's the same last pgd where kernel image
>> >>>>> is mapped.
>> >>>>>
>> >>>>
>> >>>>
>> >>>> Something like bellow might work. It's just a proposal to demonstrate the idea, so some code might look ugly.
>> >>>> And it's only build-tested.
>> >>>
>> >>> [Sorry for loong delay.]
>> >>>
>> >>> The patch works for me for legacy boot. But it breaks EFI boot with
>> >>> 5-level paging. And I struggle to understand why.
>> >>>
>> >>> What I see is many page faults at mm/kasan/kasan.c:758 --
>> >>> "DEFINE_ASAN_LOAD_STORE(4)". Handling one of them I get double-fault at
>> >>> arch/x86/kernel/head_64.S:298 -- "pushq %r14", which ends up with triple
>> >>> fault.
>> >>>
>> >>> Any ideas?
>> >>
>> >>
>> >> Just playing the role of the rubber duck:
>> >> - what is the fault address?
>> >> - is it within the shadow range?
>> >> - was the shadow mapped already?
>> >
>> > I misread trace. The initial fault is at arch/x86/kernel/head_64.S:270,
>> > which is ".endr" in definition of early_idt_handler_array.
>> >
>> > The fault address for all three faults is 0xffffffff7ffffff8, which is
>> > outside shadow range. It's just before kernel text mapping.
>> >
>> > Codewise, it happens in load_ucode_bsp() -- after kasan_early_init(), but
>> > before kasan_init().
>>
>> My theory is that, in 5 level mode, the early IDT code isn't all mapped
>> in the page tables. This could sometimes be papered over by lazy page
>> table setup, but lazy setup can't handle faults in the page fault code
>> or data structures.
>>
>> EFI sometimes uses separate page tables, which could contribute.
>
> As far as I can see all involved code is within the same page:
>
> (gdb) p/x &x86_64_start_kernel
> $1 = 0xffffffff84bad2ae
> (gdb) p/x &early_idt_handler_array
> $2 = 0xffffffff84bad000
> (gdb) p/x &early_idt_handler_common
> $3 = 0xffffffff84bad120
> (gdb) p/x &early_make_pgtable
> $4 = 0xffffffff84bad3b4
>

Can you give the disassembly of the backtrace lines? Blaming the
.endr doesn't make much sense to me.

Or maybe Andrey will figure it out quickly.

2017-07-10 21:24:09

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: KASAN vs. boot-time switching between 4- and 5-level paging

On Mon, Jul 10, 2017 at 01:07:13PM -0700, Andy Lutomirski wrote:
> Can you give the disassembly of the backtrace lines? Blaming the
> .endr doesn't make much sense to me.

I don't have backtrace. It's before printk() is functional. I only see
triple fault and reboot.

I had to rely on qemu tracing and gdb.

--
Kirill A. Shutemov

2017-07-11 00:31:02

by Andy Lutomirski

[permalink] [raw]
Subject: Re: KASAN vs. boot-time switching between 4- and 5-level paging

On Mon, Jul 10, 2017 at 2:24 PM, Kirill A. Shutemov
<[email protected]> wrote:
> On Mon, Jul 10, 2017 at 01:07:13PM -0700, Andy Lutomirski wrote:
>> Can you give the disassembly of the backtrace lines? Blaming the
>> .endr doesn't make much sense to me.
>
> I don't have backtrace. It's before printk() is functional. I only see
> triple fault and reboot.
>
> I had to rely on qemu tracing and gdb.

Can you ask GDB or objtool to disassemble around those addresses? Can
you also attach the big dump that QEMU throws out that shows register
state? In particular, CR2, CR3, and CR4 could be useful.

--Andy

2017-07-11 10:35:55

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: KASAN vs. boot-time switching between 4- and 5-level paging

On Mon, Jul 10, 2017 at 05:30:38PM -0700, Andy Lutomirski wrote:
> On Mon, Jul 10, 2017 at 2:24 PM, Kirill A. Shutemov
> <[email protected]> wrote:
> > On Mon, Jul 10, 2017 at 01:07:13PM -0700, Andy Lutomirski wrote:
> >> Can you give the disassembly of the backtrace lines? Blaming the
> >> .endr doesn't make much sense to me.
> >
> > I don't have backtrace. It's before printk() is functional. I only see
> > triple fault and reboot.
> >
> > I had to rely on qemu tracing and gdb.
>
> Can you ask GDB or objtool to disassemble around those addresses? Can
> you also attach the big dump that QEMU throws out that shows register
> state? In particular, CR2, CR3, and CR4 could be useful.

The last three execptions:

check_exception old: 0xffffffff new 0xe, cr2: 0xffffffff7ffffff8, rip: 0xffffffff84bb3036
RAX=00000000ffffffff RBX=ffffffff800000d8 RCX=ffffffff84be4021 RDX=dffffc0000000000
RSI=0000000000000006 RDI=ffffffff84c57000 RBP=ffffffff800000c8 RSP=ffffffff80000000
R8 =6d756e2032616476 R9 =2f7665642f3d746f R10=6f72203053797474 R11=3d656c6f736e6f63
R12=0000000000000006 R13=000000003fffb000 R14=ffffffff82a07ed8 R15=000000000140008e
RIP=ffffffff84bb3036 RFL=00000006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 0000000000000000 00000000 00000000
CS =0010 0000000000000000 ffffffff 00af9b00 DPL=0 CS64 [-RA]
SS =0000 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
DS =0000 0000000000000000 00000000 00000000
FS =0000 0000000000000000 00000000 00000000
GS =0000 ffffffff84b8f000 00000000 00000000
LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
GDT= ffffffff84ba1000 0000007f
IDT= ffffffff84d92000 00000fff
CR0=80050033 CR2=ffffffff7ffffff8 CR3=0000000009c58000 CR4=000010a0
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000d01

check_exception old: 0xe new 0xe, cr2: 0xffffffff7ffffff8, rip: 0xffffffff84bb3141
RAX=00000000ffffffff RBX=ffffffff800000d8 RCX=ffffffff84be4021 RDX=dffffc0000000000
RSI=0000000000000006 RDI=ffffffff84c57000 RBP=ffffffff800000c8 RSP=ffffffff80000000
R8 =6d756e2032616476 R9 =2f7665642f3d746f R10=6f72203053797474 R11=3d656c6f736e6f63
R12=0000000000000006 R13=000000003fffb000 R14=ffffffff82a07ed8 R15=000000000140008e
RIP=ffffffff84bb3141 RFL=00000006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 0000000000000000 00000000 00000000
CS =0010 0000000000000000 ffffffff 00af9b00 DPL=0 CS64 [-RA]
SS =0000 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
DS =0000 0000000000000000 00000000 00000000
FS =0000 0000000000000000 00000000 00000000
GS =0000 ffffffff84b8f000 00000000 00000000
LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
GDT= ffffffff84ba1000 0000007f
IDT= ffffffff84d92000 00000fff
CR0=80050033 CR2=ffffffff7ffffff8 CR3=0000000009c58000 CR4=000010a0
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000d01

check_exception old: 0x8 new 0xe, cr2: 0xffffffff7ffffff8, rip: 0xffffffff84bb3141
RAX=00000000ffffffff RBX=ffffffff800000d8 RCX=ffffffff84be4021 RDX=dffffc0000000000
RSI=0000000000000006 RDI=ffffffff84c57000 RBP=ffffffff800000c8 RSP=ffffffff80000000
R8 =6d756e2032616476 R9 =2f7665642f3d746f R10=6f72203053797474 R11=3d656c6f736e6f63
R12=0000000000000006 R13=000000003fffb000 R14=ffffffff82a07ed8 R15=000000000140008e
RIP=ffffffff84bb3141 RFL=00000006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 0000000000000000 00000000 00000000
CS =0010 0000000000000000 ffffffff 00af9b00 DPL=0 CS64 [-RA]
SS =0000 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
DS =0000 0000000000000000 00000000 00000000
FS =0000 0000000000000000 00000000 00000000
GS =0000 ffffffff84b8f000 00000000 00000000
LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
GDT= ffffffff84ba1000 0000007f
IDT= ffffffff84d92000 00000fff
CR0=80050033 CR2=ffffffff7ffffff8 CR3=0000000009c58000 CR4=000010a0
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000d01
Triple fault

Dump of assembler code for function early_idt_handler_array:
0xffffffff84bb3000 <+0>: pushq $0x0
0xffffffff84bb3002 <+2>: pushq $0x0
0xffffffff84bb3004 <+4>: jmpq 0xffffffff84bb3120 <early_idt_handler_common>
0xffffffff84bb3009 <+9>: pushq $0x0
0xffffffff84bb300b <+11>: pushq $0x1
0xffffffff84bb300d <+13>: jmpq 0xffffffff84bb3120 <early_idt_handler_common>
0xffffffff84bb3012 <+18>: pushq $0x0
0xffffffff84bb3014 <+20>: pushq $0x2
0xffffffff84bb3016 <+22>: jmpq 0xffffffff84bb3120 <early_idt_handler_common>
0xffffffff84bb301b <+27>: pushq $0x0
0xffffffff84bb301d <+29>: pushq $0x3
0xffffffff84bb301f <+31>: jmpq 0xffffffff84bb3120 <early_idt_handler_common>
0xffffffff84bb3024 <+36>: pushq $0x0
0xffffffff84bb3026 <+38>: pushq $0x4
0xffffffff84bb3028 <+40>: jmpq 0xffffffff84bb3120 <early_idt_handler_common>
0xffffffff84bb302d <+45>: pushq $0x0
0xffffffff84bb302f <+47>: pushq $0x5
0xffffffff84bb3031 <+49>: jmpq 0xffffffff84bb3120 <early_idt_handler_common>
=> 0xffffffff84bb3036 <+54>: pushq $0x0
0xffffffff84bb3038 <+56>: pushq $0x6
0xffffffff84bb303a <+58>: jmpq 0xffffffff84bb3120 <early_idt_handler_common>
0xffffffff84bb303f <+63>: pushq $0x0
0xffffffff84bb3041 <+65>: pushq $0x7
0xffffffff84bb3043 <+67>: jmpq 0xffffffff84bb3120 <early_idt_handler_common>
0xffffffff84bb3048 <+72>: pushq $0x8
0xffffffff84bb304a <+74>: jmpq 0xffffffff84bb3120 <early_idt_handler_common>
0xffffffff84bb304f <+79>: int3
0xffffffff84bb3050 <+80>: int3
...

--
Kirill A. Shutemov

2017-07-11 15:07:27

by Andy Lutomirski

[permalink] [raw]
Subject: Re: KASAN vs. boot-time switching between 4- and 5-level paging

On Tue, Jul 11, 2017 at 3:35 AM, Kirill A. Shutemov
<[email protected]> wrote:
> On Mon, Jul 10, 2017 at 05:30:38PM -0700, Andy Lutomirski wrote:
>> On Mon, Jul 10, 2017 at 2:24 PM, Kirill A. Shutemov
>> <[email protected]> wrote:
>> > On Mon, Jul 10, 2017 at 01:07:13PM -0700, Andy Lutomirski wrote:
>> >> Can you give the disassembly of the backtrace lines? Blaming the
>> >> .endr doesn't make much sense to me.
>> >
>> > I don't have backtrace. It's before printk() is functional. I only see
>> > triple fault and reboot.
>> >
>> > I had to rely on qemu tracing and gdb.
>>
>> Can you ask GDB or objtool to disassemble around those addresses? Can
>> you also attach the big dump that QEMU throws out that shows register
>> state? In particular, CR2, CR3, and CR4 could be useful.
>
> The last three execptions:
>
> check_exception old: 0xffffffff new 0xe, cr2: 0xffffffff7ffffff8, rip: 0xffffffff84bb3036
> RAX=00000000ffffffff RBX=ffffffff800000d8 RCX=ffffffff84be4021 RDX=dffffc0000000000
> RSI=0000000000000006 RDI=ffffffff84c57000 RBP=ffffffff800000c8 RSP=ffffffff80000000

So RSP was 0xffffffff80000000, a push happened, and we tried to write
to 0xffffffff7ffffff8, which failed.

> check_exception old: 0xe new 0xe, cr2: 0xffffffff7ffffff8, rip: 0xffffffff84bb3141
> RAX=00000000ffffffff RBX=ffffffff800000d8 RCX=ffffffff84be4021 RDX=dffffc0000000000
> RSI=0000000000000006 RDI=ffffffff84c57000 RBP=ffffffff800000c8 RSP=ffffffff80000000

And #PF doesn't use IST, so it double-faulted.

Either the stack isn't mapped in the page tables, RSP is corrupt, or
there's a genuine stack overflow here.

2017-07-11 15:13:37

by Andrey Ryabinin

[permalink] [raw]
Subject: Re: KASAN vs. boot-time switching between 4- and 5-level paging



On 07/11/2017 06:06 PM, Andy Lutomirski wrote:
> On Tue, Jul 11, 2017 at 3:35 AM, Kirill A. Shutemov
> <[email protected]> wrote:
>> On Mon, Jul 10, 2017 at 05:30:38PM -0700, Andy Lutomirski wrote:
>>> On Mon, Jul 10, 2017 at 2:24 PM, Kirill A. Shutemov
>>> <[email protected]> wrote:
>>>> On Mon, Jul 10, 2017 at 01:07:13PM -0700, Andy Lutomirski wrote:
>>>>> Can you give the disassembly of the backtrace lines? Blaming the
>>>>> .endr doesn't make much sense to me.
>>>>
>>>> I don't have backtrace. It's before printk() is functional. I only see
>>>> triple fault and reboot.
>>>>
>>>> I had to rely on qemu tracing and gdb.
>>>
>>> Can you ask GDB or objtool to disassemble around those addresses? Can
>>> you also attach the big dump that QEMU throws out that shows register
>>> state? In particular, CR2, CR3, and CR4 could be useful.
>>
>> The last three execptions:
>>
>> check_exception old: 0xffffffff new 0xe, cr2: 0xffffffff7ffffff8, rip: 0xffffffff84bb3036
>> RAX=00000000ffffffff RBX=ffffffff800000d8 RCX=ffffffff84be4021 RDX=dffffc0000000000
>> RSI=0000000000000006 RDI=ffffffff84c57000 RBP=ffffffff800000c8 RSP=ffffffff80000000
>
> So RSP was 0xffffffff80000000, a push happened, and we tried to write
> to 0xffffffff7ffffff8, which failed.
>
>> check_exception old: 0xe new 0xe, cr2: 0xffffffff7ffffff8, rip: 0xffffffff84bb3141
>> RAX=00000000ffffffff RBX=ffffffff800000d8 RCX=ffffffff84be4021 RDX=dffffc0000000000
>> RSI=0000000000000006 RDI=ffffffff84c57000 RBP=ffffffff800000c8 RSP=ffffffff80000000
>
> And #PF doesn't use IST, so it double-faulted.
>
> Either the stack isn't mapped in the page tables, RSP is corrupt, or
> there's a genuine stack overflow here.
>

I reproduced this, and this is kasan bug:

│0xffffffff84864897 <x86_early_init_platform_quirks+5> mov $0xffffffff83f1d0b8,%rdi
│0xffffffff8486489e <x86_early_init_platform_quirks+12> movabs $0xdffffc0000000000,%rax
│0xffffffff848648a8 <x86_early_init_platform_quirks+22> push %rbp
│0xffffffff848648a9 <x86_early_init_platform_quirks+23> mov %rdi,%rdx
│0xffffffff848648ac <x86_early_init_platform_quirks+26> shr $0x3,%rdx
│0xffffffff848648b0 <x86_early_init_platform_quirks+30> mov %rsp,%rbp
>│0xffffffff848648b3 <x86_early_init_platform_quirks+33> mov (%rdx,%rax,1),%al

we crash on the last move which is a read from shadow

(gdb) p/x $rdx
$1 = 0x1ffffffff07e3a17
(gdb) p/x $rax
$2 = 0xdffffc0000000000

(gdb) p/x 0xdffffc0000000000 + 0x1ffffffff07e3a17
$4 = 0xfffffbfff07e3a17
(gdb) p/x *0xfffffbfff07e3a17
Cannot access memory at address 0xfffffbfff07e3a17

2017-07-11 16:43:40

by Andrey Ryabinin

[permalink] [raw]
Subject: Re: KASAN vs. boot-time switching between 4- and 5-level paging

On 07/11/2017 06:15 PM, Andrey Ryabinin wrote:
>
> I reproduced this, and this is kasan bug:
>
> │0xffffffff84864897 <x86_early_init_platform_quirks+5> mov $0xffffffff83f1d0b8,%rdi
> │0xffffffff8486489e <x86_early_init_platform_quirks+12> movabs $0xdffffc0000000000,%rax
> │0xffffffff848648a8 <x86_early_init_platform_quirks+22> push %rbp
> │0xffffffff848648a9 <x86_early_init_platform_quirks+23> mov %rdi,%rdx
> │0xffffffff848648ac <x86_early_init_platform_quirks+26> shr $0x3,%rdx
> │0xffffffff848648b0 <x86_early_init_platform_quirks+30> mov %rsp,%rbp
> >│0xffffffff848648b3 <x86_early_init_platform_quirks+33> mov (%rdx,%rax,1),%al
>
> we crash on the last move which is a read from shadow


Ughh, I forgot about phys_base.
Plus, I added KASAN_SANITIZE_paravirt.o :=n because with PARAVIRTY=y set_pgd() calls native_set_pgd()
from paravirt.c translation unit.



---
arch/x86/kernel/Makefile | 1 +
arch/x86/mm/kasan_init_64.c | 3 ++-
2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 4b994232cb57..5a1f18b87fb2 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -24,6 +24,7 @@ KASAN_SANITIZE_head$(BITS).o := n
KASAN_SANITIZE_dumpstack.o := n
KASAN_SANITIZE_dumpstack_$(BITS).o := n
KASAN_SANITIZE_stacktrace.o := n
+KASAN_SANITIZE_paravirt.o := n

OBJECT_FILES_NON_STANDARD_head_$(BITS).o := y
OBJECT_FILES_NON_STANDARD_relocate_kernel_$(BITS).o := y
diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c
index d79a7ea83d05..d5743fd37df9 100644
--- a/arch/x86/mm/kasan_init_64.c
+++ b/arch/x86/mm/kasan_init_64.c
@@ -72,7 +72,8 @@ static void __init kasan_early_p4d_populate(pgd_t *pgd,
* TODO: we need helpers for this shit
*/
if (CONFIG_PGTABLE_LEVELS == 5)
- p4d = ((p4d_t*)((__pa_nodebug(pgd->pgd) & PTE_PFN_MASK) + __START_KERNEL_map))
+ p4d = ((p4d_t*)((__pa_nodebug(pgd->pgd) & PTE_PFN_MASK)
+ + __START_KERNEL_map - phys_base))
+ p4d_index(addr);
else
p4d = (p4d_t*)pgd;
--
2.13.0


2017-07-11 17:03:38

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: KASAN vs. boot-time switching between 4- and 5-level paging

On Tue, Jul 11, 2017 at 07:45:48PM +0300, Andrey Ryabinin wrote:
> On 07/11/2017 06:15 PM, Andrey Ryabinin wrote:
> >
> > I reproduced this, and this is kasan bug:
> >
> > │0xffffffff84864897 <x86_early_init_platform_quirks+5> mov $0xffffffff83f1d0b8,%rdi
> > │0xffffffff8486489e <x86_early_init_platform_quirks+12> movabs $0xdffffc0000000000,%rax
> > │0xffffffff848648a8 <x86_early_init_platform_quirks+22> push %rbp
> > │0xffffffff848648a9 <x86_early_init_platform_quirks+23> mov %rdi,%rdx
> > │0xffffffff848648ac <x86_early_init_platform_quirks+26> shr $0x3,%rdx
> > │0xffffffff848648b0 <x86_early_init_platform_quirks+30> mov %rsp,%rbp
> > >│0xffffffff848648b3 <x86_early_init_platform_quirks+33> mov (%rdx,%rax,1),%al
> >
> > we crash on the last move which is a read from shadow
>
>
> Ughh, I forgot about phys_base.

Thanks! Works for me.

Can use your Signed-off-by for a [cleaned up version of your] patch?


--
Kirill A. Shutemov

2017-07-11 17:27:55

by Andrey Ryabinin

[permalink] [raw]
Subject: Re: KASAN vs. boot-time switching between 4- and 5-level paging



On 07/11/2017 08:03 PM, Kirill A. Shutemov wrote:
> On Tue, Jul 11, 2017 at 07:45:48PM +0300, Andrey Ryabinin wrote:
>> On 07/11/2017 06:15 PM, Andrey Ryabinin wrote:
>>>
>>> I reproduced this, and this is kasan bug:
>>>
>>> │0xffffffff84864897 <x86_early_init_platform_quirks+5> mov $0xffffffff83f1d0b8,%rdi
>>> │0xffffffff8486489e <x86_early_init_platform_quirks+12> movabs $0xdffffc0000000000,%rax
>>> │0xffffffff848648a8 <x86_early_init_platform_quirks+22> push %rbp
>>> │0xffffffff848648a9 <x86_early_init_platform_quirks+23> mov %rdi,%rdx
>>> │0xffffffff848648ac <x86_early_init_platform_quirks+26> shr $0x3,%rdx
>>> │0xffffffff848648b0 <x86_early_init_platform_quirks+30> mov %rsp,%rbp
>>> >│0xffffffff848648b3 <x86_early_init_platform_quirks+33> mov (%rdx,%rax,1),%al
>>>
>>> we crash on the last move which is a read from shadow
>>
>>
>> Ughh, I forgot about phys_base.
>
> Thanks! Works for me.
>
> Can use your Signed-off-by for a [cleaned up version of your] patch?

Sure.

2017-07-11 19:06:03

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: KASAN vs. boot-time switching between 4- and 5-level paging

> > Can use your Signed-off-by for a [cleaned up version of your] patch?
>
> Sure.

Another KASAN-releated issue: dumping page tables for KASAN shadow memory
region takes unreasonable time due to kasan_zero_p?? mapped there.

The patch below helps. Any objections?

diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index b371ab68f2d4..8601153c34e7 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -17,8 +17,8 @@
#include <linux/init.h>
#include <linux/sched.h>
#include <linux/seq_file.h>
+#include <linux/kasan.h>

-#include <asm/kasan.h>
#include <asm/pgtable.h>

/*
@@ -291,10 +291,15 @@ static void note_page(struct seq_file *m, struct pg_state *st,
static void walk_pte_level(struct seq_file *m, struct pg_state *st, pmd_t addr, unsigned long P)
{
int i;
+ unsigned long pte_addr;
pte_t *start;
pgprotval_t prot;

- start = (pte_t *)pmd_page_vaddr(addr);
+ pte_addr = pmd_page_vaddr(addr);
+ if (__pa(pte_addr) == __pa(kasan_zero_pte))
+ return;
+
+ start = (pte_t *)pte_addr;
for (i = 0; i < PTRS_PER_PTE; i++) {
prot = pte_flags(*start);
st->current_address = normalize_addr(P + i * PTE_LEVEL_MULT);
@@ -308,10 +313,15 @@ static void walk_pte_level(struct seq_file *m, struct pg_state *st, pmd_t addr,
static void walk_pmd_level(struct seq_file *m, struct pg_state *st, pud_t addr, unsigned long P)
{
int i;
+ unsigned long pmd_addr;
pmd_t *start;
pgprotval_t prot;

- start = (pmd_t *)pud_page_vaddr(addr);
+ pmd_addr = pud_page_vaddr(addr);
+ if (__pa(pmd_addr) == __pa(kasan_zero_pmd))
+ return;
+
+ start = (pmd_t *)pmd_addr;
for (i = 0; i < PTRS_PER_PMD; i++) {
st->current_address = normalize_addr(P + i * PMD_LEVEL_MULT);
if (!pmd_none(*start)) {
@@ -350,12 +360,16 @@ static bool pud_already_checked(pud_t *prev_pud, pud_t *pud, bool checkwx)
static void walk_pud_level(struct seq_file *m, struct pg_state *st, p4d_t addr, unsigned long P)
{
int i;
+ unsigned long pud_addr;
pud_t *start;
pgprotval_t prot;
pud_t *prev_pud = NULL;

- start = (pud_t *)p4d_page_vaddr(addr);
+ pud_addr = p4d_page_vaddr(addr);
+ if (__pa(pud_addr) == __pa(kasan_zero_pud))
+ return;

+ start = (pud_t *)pud_addr;
for (i = 0; i < PTRS_PER_PUD; i++) {
st->current_address = normalize_addr(P + i * PUD_LEVEL_MULT);
if (!pud_none(*start) &&
@@ -386,11 +400,15 @@ static void walk_pud_level(struct seq_file *m, struct pg_state *st, p4d_t addr,
static void walk_p4d_level(struct seq_file *m, struct pg_state *st, pgd_t addr, unsigned long P)
{
int i;
+ unsigned long p4d_addr;
p4d_t *start;
pgprotval_t prot;

- start = (p4d_t *)pgd_page_vaddr(addr);
+ p4d_addr = pgd_page_vaddr(addr);
+ if (__pa(p4d_addr) == __pa(kasan_zero_p4d))
+ return;

+ start = (p4d_t *)p4d_addr;
for (i = 0; i < PTRS_PER_P4D; i++) {
st->current_address = normalize_addr(P + i * P4D_LEVEL_MULT);
if (!p4d_none(*start)) {
--
Kirill A. Shutemov

2017-07-13 12:56:18

by Andrey Ryabinin

[permalink] [raw]
Subject: Re: KASAN vs. boot-time switching between 4- and 5-level paging

On 07/11/2017 10:05 PM, Kirill A. Shutemov wrote:
>>> Can use your Signed-off-by for a [cleaned up version of your] patch?
>>
>> Sure.
>
> Another KASAN-releated issue: dumping page tables for KASAN shadow memory
> region takes unreasonable time due to kasan_zero_p?? mapped there.
>
> The patch below helps. Any objections?
>

Well, page tables dump doesn't work at all on 5-level paging.
E.g. I've got this nonsense:

....
---[ Kernel Space ]---
0xffff800000000000-0xffff808000000000 512G pud
---[ Low Kernel Mapping ]---
0xffff808000000000-0xffff810000000000 512G pud
---[ vmalloc() Area ]---
0xffff810000000000-0xffff818000000000 512G pud
---[ Vmemmap ]---
0xffff818000000000-0xffffff0000000000 128512G pud
---[ ESPfix Area ]---
0xffffff0000000000-0x0000000000000000 1T pud
0x0000000000000000-0x0000000000000000 0E pgd
0x0000000000000000-0x0000000000001000 4K RW PCD GLB NX pte
0x0000000000001000-0x0000000000002000 4K pte
0x0000000000002000-0x0000000000003000 4K ro GLB NX pte
0x0000000000003000-0x0000000000004000 4K pte
0x0000000000004000-0x0000000000007000 12K RW GLB NX pte
0x0000000000007000-0x0000000000008000 4K pte
0x0000000000008000-0x0000000000108000 1M RW GLB NX pte
0x0000000000108000-0x0000000000109000 4K pte
0x0000000000109000-0x0000000000189000 512K RW GLB NX pte
0x0000000000189000-0x000000000018a000 4K pte
0x000000000018a000-0x000000000018e000 16K RW GLB NX pte
0x000000000018e000-0x000000000018f000 4K pte
0x000000000018f000-0x0000000000193000 16K RW GLB NX pte
0x0000000000193000-0x0000000000194000 4K pte
... 304 entries skipped ...
---[ EFI Runtime Services ]---
0xffffffef00000000-0xffffffff80000000 66G pud
---[ High Kernel Mapping ]---
0xffffffff80000000-0xffffffffc0000000 1G pud
...



As for KASAN, I think it would be better just to make it work faster, the patch below demonstrates the idea.



---
arch/x86/mm/dump_pagetables.c | 22 ++++++++++++++++------
1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index 0470826d2bdc..36515fba86b0 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -13,6 +13,7 @@
*/

#include <linux/debugfs.h>
+#include <linux/kasan.h>
#include <linux/mm.h>
#include <linux/init.h>
#include <linux/sched.h>
@@ -307,16 +308,19 @@ static void walk_pte_level(struct seq_file *m, struct pg_state *st, pmd_t addr,
static void walk_pmd_level(struct seq_file *m, struct pg_state *st, pud_t addr, unsigned long P)
{
int i;
- pmd_t *start;
+ pmd_t *start, *pmd_addr;
pgprotval_t prot;

- start = (pmd_t *)pud_page_vaddr(addr);
+ pmd_addr = start = (pmd_t *)pud_page_vaddr(addr);
for (i = 0; i < PTRS_PER_PMD; i++) {
st->current_address = normalize_addr(P + i * PMD_LEVEL_MULT);
if (!pmd_none(*start)) {
if (pmd_large(*start) || !pmd_present(*start)) {
prot = pmd_flags(*start);
note_page(m, st, __pgprot(prot), 3);
+ } else if (__pa(pmd_addr) == __pa(kasan_zero_pmd)) {
+ prot = pte_flags(kasan_zero_pte[0]);
+ note_page(m, st, __pgprot(prot), 4);
} else {
walk_pte_level(m, st, *start,
P + i * PMD_LEVEL_MULT);
@@ -349,11 +353,11 @@ static bool pud_already_checked(pud_t *prev_pud, pud_t *pud, bool checkwx)
static void walk_pud_level(struct seq_file *m, struct pg_state *st, p4d_t addr, unsigned long P)
{
int i;
- pud_t *start;
+ pud_t *start, *pud_addr;
pgprotval_t prot;
pud_t *prev_pud = NULL;

- start = (pud_t *)p4d_page_vaddr(addr);
+ pud_addr = start = (pud_t *)p4d_page_vaddr(addr);

for (i = 0; i < PTRS_PER_PUD; i++) {
st->current_address = normalize_addr(P + i * PUD_LEVEL_MULT);
@@ -362,6 +366,9 @@ static void walk_pud_level(struct seq_file *m, struct pg_state *st, p4d_t addr,
if (pud_large(*start) || !pud_present(*start)) {
prot = pud_flags(*start);
note_page(m, st, __pgprot(prot), 2);
+ } else if (__pa(pud_addr) == __pa(kasan_zero_pud)) {
+ prot = pte_flags(kasan_zero_pte[0]);
+ note_page(m, st, __pgprot(prot), 4);
} else {
walk_pmd_level(m, st, *start,
P + i * PUD_LEVEL_MULT);
@@ -385,10 +392,10 @@ static void walk_pud_level(struct seq_file *m, struct pg_state *st, p4d_t addr,
static void walk_p4d_level(struct seq_file *m, struct pg_state *st, pgd_t addr, unsigned long P)
{
int i;
- p4d_t *start;
+ p4d_t *start, *p4d_addr;
pgprotval_t prot;

- start = (p4d_t *)pgd_page_vaddr(addr);
+ p4d_addr = start = (p4d_t *)pgd_page_vaddr(addr);

for (i = 0; i < PTRS_PER_P4D; i++) {
st->current_address = normalize_addr(P + i * P4D_LEVEL_MULT);
@@ -396,6 +403,9 @@ static void walk_p4d_level(struct seq_file *m, struct pg_state *st, pgd_t addr,
if (p4d_large(*start) || !p4d_present(*start)) {
prot = p4d_flags(*start);
note_page(m, st, __pgprot(prot), 2);
+ } else if (__pa(p4d_addr) == __pa(kasan_zero_p4d)) {
+ prot = pte_flags(kasan_zero_pte[0]);
+ note_page(m, st, __pgprot(prot), 4);
} else {
walk_pud_level(m, st, *start,
P + i * P4D_LEVEL_MULT);
--
2.13.0

2017-07-13 13:52:35

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: KASAN vs. boot-time switching between 4- and 5-level paging

On Thu, Jul 13, 2017 at 03:58:29PM +0300, Andrey Ryabinin wrote:
> On 07/11/2017 10:05 PM, Kirill A. Shutemov wrote:
> >>> Can use your Signed-off-by for a [cleaned up version of your] patch?
> >>
> >> Sure.
> >
> > Another KASAN-releated issue: dumping page tables for KASAN shadow memory
> > region takes unreasonable time due to kasan_zero_p?? mapped there.
> >
> > The patch below helps. Any objections?
> >
>
> Well, page tables dump doesn't work at all on 5-level paging.
> E.g. I've got this nonsense:
>
> ....
> ---[ Kernel Space ]---
> 0xffff800000000000-0xffff808000000000 512G pud
> ---[ Low Kernel Mapping ]---
> 0xffff808000000000-0xffff810000000000 512G pud
> ---[ vmalloc() Area ]---
> 0xffff810000000000-0xffff818000000000 512G pud
> ---[ Vmemmap ]---
> 0xffff818000000000-0xffffff0000000000 128512G pud
> ---[ ESPfix Area ]---
> 0xffffff0000000000-0x0000000000000000 1T pud
> 0x0000000000000000-0x0000000000000000 0E pgd
> 0x0000000000000000-0x0000000000001000 4K RW PCD GLB NX pte
> 0x0000000000001000-0x0000000000002000 4K pte
> 0x0000000000002000-0x0000000000003000 4K ro GLB NX pte
> 0x0000000000003000-0x0000000000004000 4K pte
> 0x0000000000004000-0x0000000000007000 12K RW GLB NX pte
> 0x0000000000007000-0x0000000000008000 4K pte
> 0x0000000000008000-0x0000000000108000 1M RW GLB NX pte
> 0x0000000000108000-0x0000000000109000 4K pte
> 0x0000000000109000-0x0000000000189000 512K RW GLB NX pte
> 0x0000000000189000-0x000000000018a000 4K pte
> 0x000000000018a000-0x000000000018e000 16K RW GLB NX pte
> 0x000000000018e000-0x000000000018f000 4K pte
> 0x000000000018f000-0x0000000000193000 16K RW GLB NX pte
> 0x0000000000193000-0x0000000000194000 4K pte
> ... 304 entries skipped ...
> ---[ EFI Runtime Services ]---
> 0xffffffef00000000-0xffffffff80000000 66G pud
> ---[ High Kernel Mapping ]---
> 0xffffffff80000000-0xffffffffc0000000 1G pud
> ...

Hm. I don't see this:

...
[ 0.247532] 0xff9e938000000000-0xff9f000000000000 111104G p4d
[ 0.247733] 0xff9f000000000000-0xffff000000000000 24P pgd
[ 0.248066] 0xffff000000000000-0xffffff0000000000 255T p4d
[ 0.248290] ---[ ESPfix Area ]---
[ 0.248393] 0xffffff0000000000-0xffffff8000000000 512G p4d
[ 0.248663] 0xffffff8000000000-0xffffffef00000000 444G pud
[ 0.248892] ---[ EFI Runtime Services ]---
[ 0.248996] 0xffffffef00000000-0xfffffffec0000000 63G pud
[ 0.249308] 0xfffffffec0000000-0xfffffffefe400000 996M pmd
...

Do you have commit "x86/dump_pagetables: Generalize address normalization"
in your tree?

https://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git/commit/?h=la57/boot-switching/v2&id=13327fec85ffe95d9c8a3f57ba174bf5d5c1fb01

> As for KASAN, I think it would be better just to make it work faster,
> the patch below demonstrates the idea.

Okay, let me test this.

> ---
> arch/x86/mm/dump_pagetables.c | 22 ++++++++++++++++------
> 1 file changed, 16 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
> index 0470826d2bdc..36515fba86b0 100644
> --- a/arch/x86/mm/dump_pagetables.c
> +++ b/arch/x86/mm/dump_pagetables.c
> @@ -13,6 +13,7 @@
> */
>
> #include <linux/debugfs.h>
> +#include <linux/kasan.h>
> #include <linux/mm.h>
> #include <linux/init.h>
> #include <linux/sched.h>

<asm/kasan.h> can be dropped. And I don't think it compiles with KASAN
disabled.

For reference, the patch I use now:

https://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git/commit/?h=la57/boot-switching/v2&id=c4b1439f719b1689a1cfca9c0df17b9f8b8462b9

--
Kirill A. Shutemov

2017-07-13 14:15:35

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: KASAN vs. boot-time switching between 4- and 5-level paging

On Thu, Jul 13, 2017 at 04:52:28PM +0300, Kirill A. Shutemov wrote:
> On Thu, Jul 13, 2017 at 03:58:29PM +0300, Andrey Ryabinin wrote:
> > On 07/11/2017 10:05 PM, Kirill A. Shutemov wrote:
> > >>> Can use your Signed-off-by for a [cleaned up version of your] patch?
> > >>
> > >> Sure.
> > >
> > > Another KASAN-releated issue: dumping page tables for KASAN shadow memory
> > > region takes unreasonable time due to kasan_zero_p?? mapped there.
> > >
> > > The patch below helps. Any objections?
> > >
> >
> > Well, page tables dump doesn't work at all on 5-level paging.
> > E.g. I've got this nonsense:
> >
> > ....
> > ---[ Kernel Space ]---
> > 0xffff800000000000-0xffff808000000000 512G pud
> > ---[ Low Kernel Mapping ]---
> > 0xffff808000000000-0xffff810000000000 512G pud
> > ---[ vmalloc() Area ]---
> > 0xffff810000000000-0xffff818000000000 512G pud
> > ---[ Vmemmap ]---
> > 0xffff818000000000-0xffffff0000000000 128512G pud
> > ---[ ESPfix Area ]---
> > 0xffffff0000000000-0x0000000000000000 1T pud
> > 0x0000000000000000-0x0000000000000000 0E pgd
> > 0x0000000000000000-0x0000000000001000 4K RW PCD GLB NX pte
> > 0x0000000000001000-0x0000000000002000 4K pte
> > 0x0000000000002000-0x0000000000003000 4K ro GLB NX pte
> > 0x0000000000003000-0x0000000000004000 4K pte
> > 0x0000000000004000-0x0000000000007000 12K RW GLB NX pte
> > 0x0000000000007000-0x0000000000008000 4K pte
> > 0x0000000000008000-0x0000000000108000 1M RW GLB NX pte
> > 0x0000000000108000-0x0000000000109000 4K pte
> > 0x0000000000109000-0x0000000000189000 512K RW GLB NX pte
> > 0x0000000000189000-0x000000000018a000 4K pte
> > 0x000000000018a000-0x000000000018e000 16K RW GLB NX pte
> > 0x000000000018e000-0x000000000018f000 4K pte
> > 0x000000000018f000-0x0000000000193000 16K RW GLB NX pte
> > 0x0000000000193000-0x0000000000194000 4K pte
> > ... 304 entries skipped ...
> > ---[ EFI Runtime Services ]---
> > 0xffffffef00000000-0xffffffff80000000 66G pud
> > ---[ High Kernel Mapping ]---
> > 0xffffffff80000000-0xffffffffc0000000 1G pud
> > ...
>
> Hm. I don't see this:
>
> ...
> [ 0.247532] 0xff9e938000000000-0xff9f000000000000 111104G p4d
> [ 0.247733] 0xff9f000000000000-0xffff000000000000 24P pgd
> [ 0.248066] 0xffff000000000000-0xffffff0000000000 255T p4d
> [ 0.248290] ---[ ESPfix Area ]---
> [ 0.248393] 0xffffff0000000000-0xffffff8000000000 512G p4d
> [ 0.248663] 0xffffff8000000000-0xffffffef00000000 444G pud
> [ 0.248892] ---[ EFI Runtime Services ]---
> [ 0.248996] 0xffffffef00000000-0xfffffffec0000000 63G pud
> [ 0.249308] 0xfffffffec0000000-0xfffffffefe400000 996M pmd
> ...
>
> Do you have commit "x86/dump_pagetables: Generalize address normalization"
> in your tree?
>
> https://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git/commit/?h=la57/boot-switching/v2&id=13327fec85ffe95d9c8a3f57ba174bf5d5c1fb01
>
> > As for KASAN, I think it would be better just to make it work faster,
> > the patch below demonstrates the idea.
>
> Okay, let me test this.

The patch works for me.

The problem is not exclusive to 5-level paging, so could you prepare and
push proper patch upstream?

--
Kirill A. Shutemov

2017-07-13 14:17:39

by Andrey Ryabinin

[permalink] [raw]
Subject: Re: KASAN vs. boot-time switching between 4- and 5-level paging



On 07/13/2017 05:15 PM, Kirill A. Shutemov wrote:

>>
>> Hm. I don't see this:
>>
>> ...
>> [ 0.247532] 0xff9e938000000000-0xff9f000000000000 111104G p4d
>> [ 0.247733] 0xff9f000000000000-0xffff000000000000 24P pgd
>> [ 0.248066] 0xffff000000000000-0xffffff0000000000 255T p4d
>> [ 0.248290] ---[ ESPfix Area ]---
>> [ 0.248393] 0xffffff0000000000-0xffffff8000000000 512G p4d
>> [ 0.248663] 0xffffff8000000000-0xffffffef00000000 444G pud
>> [ 0.248892] ---[ EFI Runtime Services ]---
>> [ 0.248996] 0xffffffef00000000-0xfffffffec0000000 63G pud
>> [ 0.249308] 0xfffffffec0000000-0xfffffffefe400000 996M pmd
>> ...
>>
>> Do you have commit "x86/dump_pagetables: Generalize address normalization"
>> in your tree?
>>

Nope. Applied now, it helped.

>> https://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git/commit/?h=la57/boot-switching/v2&id=13327fec85ffe95d9c8a3f57ba174bf5d5c1fb01
>>
>>> As for KASAN, I think it would be better just to make it work faster,
>>> the patch below demonstrates the idea.
>>
>> Okay, let me test this.
>
> The patch works for me.
>
> The problem is not exclusive to 5-level paging, so could you prepare and
> push proper patch upstream?
>

Sure, will do

2017-07-24 12:13:46

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: KASAN vs. boot-time switching between 4- and 5-level paging

On Thu, Jul 13, 2017 at 05:19:22PM +0300, Andrey Ryabinin wrote:
> >> https://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git/commit/?h=la57/boot-switching/v2&id=13327fec85ffe95d9c8a3f57ba174bf5d5c1fb01
> >>
> >>> As for KASAN, I think it would be better just to make it work faster,
> >>> the patch below demonstrates the idea.
> >>
> >> Okay, let me test this.
> >
> > The patch works for me.
> >
> > The problem is not exclusive to 5-level paging, so could you prepare and
> > push proper patch upstream?
> >
>
> Sure, will do

Andrey, any follow up on this?

--
Kirill A. Shutemov

2017-07-24 14:04:58

by Andrey Ryabinin

[permalink] [raw]
Subject: Re: KASAN vs. boot-time switching between 4- and 5-level paging



On 07/24/2017 03:13 PM, Kirill A. Shutemov wrote:
> On Thu, Jul 13, 2017 at 05:19:22PM +0300, Andrey Ryabinin wrote:
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git/commit/?h=la57/boot-switching/v2&id=13327fec85ffe95d9c8a3f57ba174bf5d5c1fb01
>>>>
>>>>> As for KASAN, I think it would be better just to make it work faster,
>>>>> the patch below demonstrates the idea.
>>>>
>>>> Okay, let me test this.
>>>
>>> The patch works for me.
>>>
>>> The problem is not exclusive to 5-level paging, so could you prepare and
>>> push proper patch upstream?
>>>
>>
>> Sure, will do
>
> Andrey, any follow up on this?
>

Sorry, I've been busy a bit. Will send patch shortly