2022-11-10 20:41:37

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 0/5] x86/kasan: Bug fixes for recent CEA changes

Three fixes for the recent changes to how KASAN populates shadows for
the per-CPU portion of the CPU entry areas. The v1 versions were posted
independently as I kept root causing issues after posting individual fixes.

v2:
- Map the entire per-CPU area in one shot. [Andrey]
- Use the "early", i.e. read-only, variant to populate the shadow for
the shared portion (read-only IDT mapping) of the CEA. [Andrey]

v1:
- https://lore.kernel.org/all/[email protected]
- https://lore.kernel.org/all/[email protected]
- https://lore.kernel.org/all/[email protected]

Sean Christopherson (5):
x86/mm: Recompute physical address for every page of per-CPU CEA
mapping
x86/mm: Populate KASAN shadow for entire per-CPU range of CPU entry
area
x86/kasan: Rename local CPU_ENTRY_AREA variables to shorten names
x86/kasan: Add helpers to align shadow addresses up and down
x86/kasan: Populate shadow for shared chunk of the CPU entry area

arch/x86/mm/cpu_entry_area.c | 10 +++-----
arch/x86/mm/kasan_init_64.c | 50 +++++++++++++++++++++++-------------
2 files changed, 36 insertions(+), 24 deletions(-)


base-commit: 0008712a508f72242d185142cfdbd0646a661a18
--
2.38.1.431.g37b22c650d-goog



2022-11-10 20:42:05

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 2/5] x86/mm: Populate KASAN shadow for entire per-CPU range of CPU entry area

Populate a KASAN shadow for the entire possible per-CPU range of the CPU
entry area instead of requiring that each individual chunk map a shadow.
Mapping shadows individually is error prone, e.g. the per-CPU GDT mapping
was left behind, which can lead to not-present page faults during KASAN
validation if the kernel performs a software lookup into the GDT. The DS
buffer is also likely affected.

The motivation for mapping the per-CPU areas on-demand was to avoid
mapping the entire 512GiB range that's reserved for the CPU entry area,
shaving a few bytes by not creating shadows for potentially unused memory
was not a goal.

The bug is most easily reproduced by doing a sigreturn with a garbage
CS in the sigcontext, e.g.

int main(void)
{
struct sigcontext regs;

syscall(__NR_mmap, 0x1ffff000ul, 0x1000ul, 0ul, 0x32ul, -1, 0ul);
syscall(__NR_mmap, 0x20000000ul, 0x1000000ul, 7ul, 0x32ul, -1, 0ul);
syscall(__NR_mmap, 0x21000000ul, 0x1000ul, 0ul, 0x32ul, -1, 0ul);

memset(&regs, 0, sizeof(regs));
regs.cs = 0x1d0;
syscall(__NR_rt_sigreturn);
return 0;
}

to coerce the kernel into doing a GDT lookup to compute CS.base when
reading the instruction bytes on the subsequent #GP to determine whether
or not the #GP is something the kernel should handle, e.g. to fixup UMIP
violations or to emulate CLI/STI for IOPL=3 applications.

BUG: unable to handle page fault for address: fffffbc8379ace00
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 16c03a067 P4D 16c03a067 PUD 15b990067 PMD 15b98f067 PTE 0
Oops: 0000 [#1] PREEMPT SMP KASAN
CPU: 3 PID: 851 Comm: r2 Not tainted 6.1.0-rc3-next-20221103+ #432
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:kasan_check_range+0xdf/0x190
Call Trace:
<TASK>
get_desc+0xb0/0x1d0
insn_get_seg_base+0x104/0x270
insn_fetch_from_user+0x66/0x80
fixup_umip_exception+0xb1/0x530
exc_general_protection+0x181/0x210
asm_exc_general_protection+0x22/0x30
RIP: 0003:0x0
Code: Unable to access opcode bytes at 0xffffffffffffffd6.
RSP: 0003:0000000000000000 EFLAGS: 00000202
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000000001d0
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
</TASK>

Fixes: 9fd429c28073 ("x86/kasan: Map shadow for percpu pages on demand")
Reported-by: [email protected]
Suggested-by: Andrey Ryabinin <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Vincenzo Frascino <[email protected]>
Cc: [email protected]
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/mm/cpu_entry_area.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/x86/mm/cpu_entry_area.c b/arch/x86/mm/cpu_entry_area.c
index d831aae94b41..7c855dffcdc2 100644
--- a/arch/x86/mm/cpu_entry_area.c
+++ b/arch/x86/mm/cpu_entry_area.c
@@ -91,11 +91,6 @@ void cea_set_pte(void *cea_vaddr, phys_addr_t pa, pgprot_t flags)
static void __init
cea_map_percpu_pages(void *cea_vaddr, void *ptr, int pages, pgprot_t prot)
{
- phys_addr_t pa = per_cpu_ptr_to_phys(ptr);
-
- kasan_populate_shadow_for_vaddr(cea_vaddr, pages * PAGE_SIZE,
- early_pfn_to_nid(PFN_DOWN(pa)));
-
for ( ; pages; pages--, cea_vaddr+= PAGE_SIZE, ptr += PAGE_SIZE)
cea_set_pte(cea_vaddr, per_cpu_ptr_to_phys(ptr), prot);
}
@@ -195,6 +190,9 @@ static void __init setup_cpu_entry_area(unsigned int cpu)
pgprot_t tss_prot = PAGE_KERNEL;
#endif

+ kasan_populate_shadow_for_vaddr(cea, CPU_ENTRY_AREA_SIZE,
+ early_cpu_to_node(cpu));
+
cea_set_pte(&cea->gdt, get_cpu_gdt_paddr(cpu), gdt_prot);

cea_map_percpu_pages(&cea->entry_stack_page,
--
2.38.1.431.g37b22c650d-goog


2022-11-10 20:55:27

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 5/5] x86/kasan: Populate shadow for shared chunk of the CPU entry area

Popuplate the shadow for the shared portion of the CPU entry area, i.e.
the read-only IDT mapping, during KASAN initialization. A recent change
modified KASAN to map the per-CPU areas on-demand, but forgot to keep a
shadow for the common area that is shared amongst all CPUs.

Map the common area in KASAN init instead of letting idt_map_in_cea() do
the dirty work so that it Just Works in the unlikely event more shared
data is shoved into the CPU entry area.

The bug manifests as a not-present #PF when software attempts to lookup
an IDT entry, e.g. when KVM is handling IRQs on Intel CPUs (KVM performs
direct CALL to the IRQ handler to avoid the overhead of INTn):

BUG: unable to handle page fault for address: fffffbc0000001d8
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 16c03a067 P4D 16c03a067 PUD 0
Oops: 0000 [#1] PREEMPT SMP KASAN
CPU: 5 PID: 901 Comm: repro Tainted: G W 6.1.0-rc3+ #410
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:kasan_check_range+0xdf/0x190
vmx_handle_exit_irqoff+0x152/0x290 [kvm_intel]
vcpu_run+0x1d89/0x2bd0 [kvm]
kvm_arch_vcpu_ioctl_run+0x3ce/0xa70 [kvm]
kvm_vcpu_ioctl+0x349/0x900 [kvm]
__x64_sys_ioctl+0xb8/0xf0
do_syscall_64+0x2b/0x50
entry_SYSCALL_64_after_hwframe+0x46/0xb0

Fixes: 9fd429c28073 ("x86/kasan: Map shadow for percpu pages on demand")
Reported-by: [email protected]
Cc: Andrey Ryabinin <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/mm/kasan_init_64.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c
index afc5e129ca7b..af82046348a0 100644
--- a/arch/x86/mm/kasan_init_64.c
+++ b/arch/x86/mm/kasan_init_64.c
@@ -341,7 +341,7 @@ void __init kasan_populate_shadow_for_vaddr(void *va, size_t size, int nid)

void __init kasan_init(void)
{
- unsigned long shadow_cea_begin, shadow_cea_end;
+ unsigned long shadow_cea_begin, shadow_cea_per_cpu_begin, shadow_cea_end;
int i;

memcpy(early_top_pgt, init_top_pgt, sizeof(early_top_pgt));
@@ -384,6 +384,7 @@ void __init kasan_init(void)
}

shadow_cea_begin = kasan_mem_to_shadow_align_down(CPU_ENTRY_AREA_BASE);
+ shadow_cea_per_cpu_begin = kasan_mem_to_shadow_align_up(CPU_ENTRY_AREA_PER_CPU);
shadow_cea_end = kasan_mem_to_shadow_align_up(CPU_ENTRY_AREA_BASE +
CPU_ENTRY_AREA_MAP_SIZE);

@@ -409,6 +410,15 @@ void __init kasan_init(void)
kasan_mem_to_shadow((void *)VMALLOC_END + 1),
(void *)shadow_cea_begin);

+ /*
+ * Populate the shadow for the shared portion of the CPU entry area.
+ * Shadows for the per-CPU areas are mapped on-demand, as each CPU's
+ * area is randomly placed somewhere in the 512GiB range and mapping
+ * the entire 512GiB range is prohibitively expensive.
+ */
+ kasan_populate_early_shadow((void *)shadow_cea_begin,
+ (void *)shadow_cea_per_cpu_begin);
+
kasan_populate_early_shadow((void *)shadow_cea_end,
kasan_mem_to_shadow((void *)__START_KERNEL_map));

--
2.38.1.431.g37b22c650d-goog


2022-11-10 21:01:45

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 4/5] x86/kasan: Add helpers to align shadow addresses up and down

Add helpers to dedup code for aligning shadow address up/down to page
boundaries when translating an address to its shadow.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/mm/kasan_init_64.c | 40 ++++++++++++++++++++-----------------
1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c
index ad7872ae10ed..afc5e129ca7b 100644
--- a/arch/x86/mm/kasan_init_64.c
+++ b/arch/x86/mm/kasan_init_64.c
@@ -316,22 +316,33 @@ void __init kasan_early_init(void)
kasan_map_early_shadow(init_top_pgt);
}

+static unsigned long kasan_mem_to_shadow_align_down(unsigned long va)
+{
+ unsigned long shadow = (unsigned long)kasan_mem_to_shadow((void *)va);
+
+ return round_down(shadow, PAGE_SIZE);
+}
+
+static unsigned long kasan_mem_to_shadow_align_up(unsigned long va)
+{
+ unsigned long shadow = (unsigned long)kasan_mem_to_shadow((void *)va);
+
+ return round_up(shadow, PAGE_SIZE);
+}
+
void __init kasan_populate_shadow_for_vaddr(void *va, size_t size, int nid)
{
unsigned long shadow_start, shadow_end;

- shadow_start = (unsigned long)kasan_mem_to_shadow(va);
- shadow_start = round_down(shadow_start, PAGE_SIZE);
- shadow_end = (unsigned long)kasan_mem_to_shadow(va + size);
- shadow_end = round_up(shadow_end, PAGE_SIZE);
-
+ shadow_start = kasan_mem_to_shadow_align_down((unsigned long)va);
+ shadow_end = kasan_mem_to_shadow_align_up((unsigned long)va + size);
kasan_populate_shadow(shadow_start, shadow_end, nid);
}

void __init kasan_init(void)
{
+ unsigned long shadow_cea_begin, shadow_cea_end;
int i;
- void *shadow_cea_begin, *shadow_cea_end;

memcpy(early_top_pgt, init_top_pgt, sizeof(early_top_pgt));

@@ -372,16 +383,9 @@ void __init kasan_init(void)
map_range(&pfn_mapped[i]);
}

- shadow_cea_begin = (void *)CPU_ENTRY_AREA_BASE;
- shadow_cea_begin = kasan_mem_to_shadow(shadow_cea_begin);
- shadow_cea_begin = (void *)round_down(
- (unsigned long)shadow_cea_begin, PAGE_SIZE);
-
- shadow_cea_end = (void *)(CPU_ENTRY_AREA_BASE +
- CPU_ENTRY_AREA_MAP_SIZE);
- shadow_cea_end = kasan_mem_to_shadow(shadow_cea_end);
- shadow_cea_end = (void *)round_up(
- (unsigned long)shadow_cea_end, PAGE_SIZE);
+ shadow_cea_begin = kasan_mem_to_shadow_align_down(CPU_ENTRY_AREA_BASE);
+ shadow_cea_end = kasan_mem_to_shadow_align_up(CPU_ENTRY_AREA_BASE +
+ CPU_ENTRY_AREA_MAP_SIZE);

kasan_populate_early_shadow(
kasan_mem_to_shadow((void *)PAGE_OFFSET + MAXMEM),
@@ -403,9 +407,9 @@ void __init kasan_init(void)

kasan_populate_early_shadow(
kasan_mem_to_shadow((void *)VMALLOC_END + 1),
- shadow_cea_begin);
+ (void *)shadow_cea_begin);

- kasan_populate_early_shadow(shadow_cea_end,
+ kasan_populate_early_shadow((void *)shadow_cea_end,
kasan_mem_to_shadow((void *)__START_KERNEL_map));

kasan_populate_shadow((unsigned long)kasan_mem_to_shadow(_stext),
--
2.38.1.431.g37b22c650d-goog


2022-11-14 13:52:11

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 0/5] x86/kasan: Bug fixes for recent CEA changes

On Thu, Nov 10, 2022 at 08:34:59PM +0000, Sean Christopherson wrote:
> Three fixes for the recent changes to how KASAN populates shadows for
> the per-CPU portion of the CPU entry areas. The v1 versions were posted
> independently as I kept root causing issues after posting individual fixes.
>
> v2:
> - Map the entire per-CPU area in one shot. [Andrey]
> - Use the "early", i.e. read-only, variant to populate the shadow for
> the shared portion (read-only IDT mapping) of the CEA. [Andrey]
>
> v1:
> - https://lore.kernel.org/all/[email protected]
> - https://lore.kernel.org/all/[email protected]
> - https://lore.kernel.org/all/[email protected]
>
> Sean Christopherson (5):
> x86/mm: Recompute physical address for every page of per-CPU CEA
> mapping
> x86/mm: Populate KASAN shadow for entire per-CPU range of CPU entry
> area
> x86/kasan: Rename local CPU_ENTRY_AREA variables to shorten names
> x86/kasan: Add helpers to align shadow addresses up and down
> x86/kasan: Populate shadow for shared chunk of the CPU entry area
>
> arch/x86/mm/cpu_entry_area.c | 10 +++-----
> arch/x86/mm/kasan_init_64.c | 50 +++++++++++++++++++++++-------------
> 2 files changed, 36 insertions(+), 24 deletions(-)

Thanks for cleaning up that mess!

2022-11-14 14:38:36

by Andrey Ryabinin

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] x86/mm: Populate KASAN shadow for entire per-CPU range of CPU entry area



On 11/10/22 23:35, Sean Christopherson wrote:
> Populate a KASAN shadow for the entire possible per-CPU range of the CPU
> entry area instead of requiring that each individual chunk map a shadow.
> Mapping shadows individually is error prone, e.g. the per-CPU GDT mapping
> was left behind, which can lead to not-present page faults during KASAN
> validation if the kernel performs a software lookup into the GDT. The DS
> buffer is also likely affected.
>
> The motivation for mapping the per-CPU areas on-demand was to avoid
> mapping the entire 512GiB range that's reserved for the CPU entry area,
> shaving a few bytes by not creating shadows for potentially unused memory
> was not a goal.
>
> The bug is most easily reproduced by doing a sigreturn with a garbage
> CS in the sigcontext, e.g.
>
> int main(void)
> {
> struct sigcontext regs;
>
> syscall(__NR_mmap, 0x1ffff000ul, 0x1000ul, 0ul, 0x32ul, -1, 0ul);
> syscall(__NR_mmap, 0x20000000ul, 0x1000000ul, 7ul, 0x32ul, -1, 0ul);
> syscall(__NR_mmap, 0x21000000ul, 0x1000ul, 0ul, 0x32ul, -1, 0ul);
>
> memset(&regs, 0, sizeof(regs));
> regs.cs = 0x1d0;
> syscall(__NR_rt_sigreturn);
> return 0;
> }
>
> to coerce the kernel into doing a GDT lookup to compute CS.base when
> reading the instruction bytes on the subsequent #GP to determine whether
> or not the #GP is something the kernel should handle, e.g. to fixup UMIP
> violations or to emulate CLI/STI for IOPL=3 applications.
>
> BUG: unable to handle page fault for address: fffffbc8379ace00
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 16c03a067 P4D 16c03a067 PUD 15b990067 PMD 15b98f067 PTE 0
> Oops: 0000 [#1] PREEMPT SMP KASAN
> CPU: 3 PID: 851 Comm: r2 Not tainted 6.1.0-rc3-next-20221103+ #432
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
> RIP: 0010:kasan_check_range+0xdf/0x190
> Call Trace:
> <TASK>
> get_desc+0xb0/0x1d0
> insn_get_seg_base+0x104/0x270
> insn_fetch_from_user+0x66/0x80
> fixup_umip_exception+0xb1/0x530
> exc_general_protection+0x181/0x210
> asm_exc_general_protection+0x22/0x30
> RIP: 0003:0x0
> Code: Unable to access opcode bytes at 0xffffffffffffffd6.
> RSP: 0003:0000000000000000 EFLAGS: 00000202
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000000001d0
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> </TASK>
>
> Fixes: 9fd429c28073 ("x86/kasan: Map shadow for percpu pages on demand")
> Reported-by: [email protected]
> Suggested-by: Andrey Ryabinin <[email protected]>
> Cc: Alexander Potapenko <[email protected]>
> Cc: Andrey Konovalov <[email protected]>
> Cc: Dmitry Vyukov <[email protected]>
> Cc: Vincenzo Frascino <[email protected]>
> Cc: [email protected]
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/mm/cpu_entry_area.c | 8 +++-----
> 1 file changed, 3 insertions(+), 5 deletions(-)
>

Reviewed-by: Andrey Ryabinin <[email protected]>

2022-11-14 14:57:42

by Andrey Ryabinin

[permalink] [raw]
Subject: Re: [PATCH v2 4/5] x86/kasan: Add helpers to align shadow addresses up and down



On 11/10/22 23:35, Sean Christopherson wrote:
> Add helpers to dedup code for aligning shadow address up/down to page
> boundaries when translating an address to its shadow.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/mm/kasan_init_64.c | 40 ++++++++++++++++++++-----------------
> 1 file changed, 22 insertions(+), 18 deletions(-)
>


Reviewed-by: Andrey Ryabinin <[email protected]>

2022-11-14 15:25:09

by Andrey Ryabinin

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] x86/kasan: Populate shadow for shared chunk of the CPU entry area



On 11/10/22 23:35, Sean Christopherson wrote:

>
> + /*
> + * Populate the shadow for the shared portion of the CPU entry area.
> + * Shadows for the per-CPU areas are mapped on-demand, as each CPU's
> + * area is randomly placed somewhere in the 512GiB range and mapping
> + * the entire 512GiB range is prohibitively expensive.
> + */
> + kasan_populate_early_shadow((void *)shadow_cea_begin,
> + (void *)shadow_cea_per_cpu_begin);
> +

I know I suggested to use "early" here, but I just realized that this might be a problem.
This will actually map shadow page for the 8 pages (KASAN_SHADOW_SCALE_SHIFT) of the original memory.
In case there is some per-cpu entry area starting right at CPU_ENTRY_AREA_PER_CPU the shadow for it will
be covered with kasan_early_shadow_page instead of the usual one.

So we need to go back to your v1 PATCH, or alternatively we can round up CPU_ENTRY_AREA_PER_CPU
#define CPU_ENTRY_AREA_PER_CPU (CPU_ENTRY_AREA_RO_IDT + PAGE_SIZE << KASAN_SHADOW_SCALE_SHIFT)

Such change will also require fixing up max_cea calculation in init_cea_offsets()


Going back kasan_populate_shadow() seems like safer and easier choice. The only disadvantage of it
that we might waste 1 page, which is not much compared to the KASAN memory overhead.



> kasan_populate_early_shadow((void *)shadow_cea_end,
> kasan_mem_to_shadow((void *)__START_KERNEL_map));
>

2022-11-14 18:12:12

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] x86/kasan: Populate shadow for shared chunk of the CPU entry area

On Mon, Nov 14, 2022, Peter Zijlstra wrote:
> On Mon, Nov 14, 2022 at 05:44:00PM +0300, Andrey Ryabinin wrote:
> > Going back kasan_populate_shadow() seems like safer and easier choice.
> > The only disadvantage of it that we might waste 1 page, which is not
> > much compared to the KASAN memory overhead.
>
> So the below delta?
>
> ---
> --- a/arch/x86/mm/kasan_init_64.c
> +++ b/arch/x86/mm/kasan_init_64.c
> @@ -388,7 +388,7 @@ void __init kasan_init(void)
> shadow_cea_end = kasan_mem_to_shadow_align_up(CPU_ENTRY_AREA_BASE +
> CPU_ENTRY_AREA_MAP_SIZE);
>
> - kasan_populate_early_shadow(
> + kasan_populate_shadow(
> kasan_mem_to_shadow((void *)PAGE_OFFSET + MAXMEM),
> kasan_mem_to_shadow((void *)VMALLOC_START));

Wrong one, that's the existing mapping. To get back to v1:

diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c
index af82046348a0..0302491d799d 100644
--- a/arch/x86/mm/kasan_init_64.c
+++ b/arch/x86/mm/kasan_init_64.c
@@ -416,8 +416,8 @@ void __init kasan_init(void)
* area is randomly placed somewhere in the 512GiB range and mapping
* the entire 512GiB range is prohibitively expensive.
*/
- kasan_populate_early_shadow((void *)shadow_cea_begin,
- (void *)shadow_cea_per_cpu_begin);
+ kasan_populate_shadow(shadow_cea_begin,
+ shadow_cea_per_cpu_begin, 0);

kasan_populate_early_shadow((void *)shadow_cea_end,
kasan_mem_to_shadow((void *)__START_KERNEL_map));

2022-11-14 18:16:43

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] x86/kasan: Populate shadow for shared chunk of the CPU entry area

On Mon, Nov 14, 2022 at 05:44:00PM +0300, Andrey Ryabinin wrote:
> Going back kasan_populate_shadow() seems like safer and easier choice.
> The only disadvantage of it that we might waste 1 page, which is not
> much compared to the KASAN memory overhead.

So the below delta?

---
--- a/arch/x86/mm/kasan_init_64.c
+++ b/arch/x86/mm/kasan_init_64.c
@@ -388,7 +388,7 @@ void __init kasan_init(void)
shadow_cea_end = kasan_mem_to_shadow_align_up(CPU_ENTRY_AREA_BASE +
CPU_ENTRY_AREA_MAP_SIZE);

- kasan_populate_early_shadow(
+ kasan_populate_shadow(
kasan_mem_to_shadow((void *)PAGE_OFFSET + MAXMEM),
kasan_mem_to_shadow((void *)VMALLOC_START));


2022-11-14 22:21:21

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] x86/kasan: Populate shadow for shared chunk of the CPU entry area

On Mon, Nov 14, 2022 at 05:53:43PM +0000, Sean Christopherson wrote:

> Wrong one, that's the existing mapping. To get back to v1:
>
> diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c
> index af82046348a0..0302491d799d 100644
> --- a/arch/x86/mm/kasan_init_64.c
> +++ b/arch/x86/mm/kasan_init_64.c
> @@ -416,8 +416,8 @@ void __init kasan_init(void)
> * area is randomly placed somewhere in the 512GiB range and mapping
> * the entire 512GiB range is prohibitively expensive.
> */
> - kasan_populate_early_shadow((void *)shadow_cea_begin,
> - (void *)shadow_cea_per_cpu_begin);
> + kasan_populate_shadow(shadow_cea_begin,
> + shadow_cea_per_cpu_begin, 0);
>
> kasan_populate_early_shadow((void *)shadow_cea_end,
> kasan_mem_to_shadow((void *)__START_KERNEL_map));

OK. It now looks like so:

https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=x86/mm&id=14ca169feec3cb442ef4d322f8f65ba360f42784

If the robots don't hate on it because I fat fingered it or seomthing
stupid, I'll go push it out tomorrow.

2022-11-15 22:32:44

by tip-bot2 for Jacob Pan

[permalink] [raw]
Subject: [tip: x86/mm] x86/mm: Populate KASAN shadow for entire per-CPU range of CPU entry area

The following commit has been merged into the x86/mm branch of tip:

Commit-ID: 4917fc63dc646d6346f5d67ce8c10df874a6f4fe
Gitweb: https://git.kernel.org/tip/4917fc63dc646d6346f5d67ce8c10df874a6f4fe
Author: Sean Christopherson <[email protected]>
AuthorDate: Thu, 10 Nov 2022 20:35:01
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Tue, 15 Nov 2022 22:29:59 +01:00

x86/mm: Populate KASAN shadow for entire per-CPU range of CPU entry area

Populate a KASAN shadow for the entire possible per-CPU range of the CPU
entry area instead of requiring that each individual chunk map a shadow.
Mapping shadows individually is error prone, e.g. the per-CPU GDT mapping
was left behind, which can lead to not-present page faults during KASAN
validation if the kernel performs a software lookup into the GDT. The DS
buffer is also likely affected.

The motivation for mapping the per-CPU areas on-demand was to avoid
mapping the entire 512GiB range that's reserved for the CPU entry area,
shaving a few bytes by not creating shadows for potentially unused memory
was not a goal.

The bug is most easily reproduced by doing a sigreturn with a garbage
CS in the sigcontext, e.g.

int main(void)
{
struct sigcontext regs;

syscall(__NR_mmap, 0x1ffff000ul, 0x1000ul, 0ul, 0x32ul, -1, 0ul);
syscall(__NR_mmap, 0x20000000ul, 0x1000000ul, 7ul, 0x32ul, -1, 0ul);
syscall(__NR_mmap, 0x21000000ul, 0x1000ul, 0ul, 0x32ul, -1, 0ul);

memset(&regs, 0, sizeof(regs));
regs.cs = 0x1d0;
syscall(__NR_rt_sigreturn);
return 0;
}

to coerce the kernel into doing a GDT lookup to compute CS.base when
reading the instruction bytes on the subsequent #GP to determine whether
or not the #GP is something the kernel should handle, e.g. to fixup UMIP
violations or to emulate CLI/STI for IOPL=3 applications.

BUG: unable to handle page fault for address: fffffbc8379ace00
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 16c03a067 P4D 16c03a067 PUD 15b990067 PMD 15b98f067 PTE 0
Oops: 0000 [#1] PREEMPT SMP KASAN
CPU: 3 PID: 851 Comm: r2 Not tainted 6.1.0-rc3-next-20221103+ #432
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:kasan_check_range+0xdf/0x190
Call Trace:
<TASK>
get_desc+0xb0/0x1d0
insn_get_seg_base+0x104/0x270
insn_fetch_from_user+0x66/0x80
fixup_umip_exception+0xb1/0x530
exc_general_protection+0x181/0x210
asm_exc_general_protection+0x22/0x30
RIP: 0003:0x0
Code: Unable to access opcode bytes at 0xffffffffffffffd6.
RSP: 0003:0000000000000000 EFLAGS: 00000202
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000000001d0
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
</TASK>

Fixes: 9fd429c28073 ("x86/kasan: Map shadow for percpu pages on demand")
Reported-by: [email protected]
Suggested-by: Andrey Ryabinin <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Andrey Ryabinin <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
arch/x86/mm/cpu_entry_area.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/x86/mm/cpu_entry_area.c b/arch/x86/mm/cpu_entry_area.c
index d831aae..7c855df 100644
--- a/arch/x86/mm/cpu_entry_area.c
+++ b/arch/x86/mm/cpu_entry_area.c
@@ -91,11 +91,6 @@ void cea_set_pte(void *cea_vaddr, phys_addr_t pa, pgprot_t flags)
static void __init
cea_map_percpu_pages(void *cea_vaddr, void *ptr, int pages, pgprot_t prot)
{
- phys_addr_t pa = per_cpu_ptr_to_phys(ptr);
-
- kasan_populate_shadow_for_vaddr(cea_vaddr, pages * PAGE_SIZE,
- early_pfn_to_nid(PFN_DOWN(pa)));
-
for ( ; pages; pages--, cea_vaddr+= PAGE_SIZE, ptr += PAGE_SIZE)
cea_set_pte(cea_vaddr, per_cpu_ptr_to_phys(ptr), prot);
}
@@ -195,6 +190,9 @@ static void __init setup_cpu_entry_area(unsigned int cpu)
pgprot_t tss_prot = PAGE_KERNEL;
#endif

+ kasan_populate_shadow_for_vaddr(cea, CPU_ENTRY_AREA_SIZE,
+ early_cpu_to_node(cpu));
+
cea_set_pte(&cea->gdt, get_cpu_gdt_paddr(cpu), gdt_prot);

cea_map_percpu_pages(&cea->entry_stack_page,

2022-11-15 22:33:04

by tip-bot2 for Jacob Pan

[permalink] [raw]
Subject: [tip: x86/mm] x86/kasan: Add helpers to align shadow addresses up and down

The following commit has been merged into the x86/mm branch of tip:

Commit-ID: 74b5a69c2a577d4fdba581171e3ebf33cddbddc1
Gitweb: https://git.kernel.org/tip/74b5a69c2a577d4fdba581171e3ebf33cddbddc1
Author: Sean Christopherson <[email protected]>
AuthorDate: Thu, 10 Nov 2022 20:35:03
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Tue, 15 Nov 2022 22:29:59 +01:00

x86/kasan: Add helpers to align shadow addresses up and down

Add helpers to dedup code for aligning shadow address up/down to page
boundaries when translating an address to its shadow.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Andrey Ryabinin <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
arch/x86/mm/kasan_init_64.c | 40 +++++++++++++++++++-----------------
1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c
index ad7872a..afc5e12 100644
--- a/arch/x86/mm/kasan_init_64.c
+++ b/arch/x86/mm/kasan_init_64.c
@@ -316,22 +316,33 @@ void __init kasan_early_init(void)
kasan_map_early_shadow(init_top_pgt);
}

+static unsigned long kasan_mem_to_shadow_align_down(unsigned long va)
+{
+ unsigned long shadow = (unsigned long)kasan_mem_to_shadow((void *)va);
+
+ return round_down(shadow, PAGE_SIZE);
+}
+
+static unsigned long kasan_mem_to_shadow_align_up(unsigned long va)
+{
+ unsigned long shadow = (unsigned long)kasan_mem_to_shadow((void *)va);
+
+ return round_up(shadow, PAGE_SIZE);
+}
+
void __init kasan_populate_shadow_for_vaddr(void *va, size_t size, int nid)
{
unsigned long shadow_start, shadow_end;

- shadow_start = (unsigned long)kasan_mem_to_shadow(va);
- shadow_start = round_down(shadow_start, PAGE_SIZE);
- shadow_end = (unsigned long)kasan_mem_to_shadow(va + size);
- shadow_end = round_up(shadow_end, PAGE_SIZE);
-
+ shadow_start = kasan_mem_to_shadow_align_down((unsigned long)va);
+ shadow_end = kasan_mem_to_shadow_align_up((unsigned long)va + size);
kasan_populate_shadow(shadow_start, shadow_end, nid);
}

void __init kasan_init(void)
{
+ unsigned long shadow_cea_begin, shadow_cea_end;
int i;
- void *shadow_cea_begin, *shadow_cea_end;

memcpy(early_top_pgt, init_top_pgt, sizeof(early_top_pgt));

@@ -372,16 +383,9 @@ void __init kasan_init(void)
map_range(&pfn_mapped[i]);
}

- shadow_cea_begin = (void *)CPU_ENTRY_AREA_BASE;
- shadow_cea_begin = kasan_mem_to_shadow(shadow_cea_begin);
- shadow_cea_begin = (void *)round_down(
- (unsigned long)shadow_cea_begin, PAGE_SIZE);
-
- shadow_cea_end = (void *)(CPU_ENTRY_AREA_BASE +
- CPU_ENTRY_AREA_MAP_SIZE);
- shadow_cea_end = kasan_mem_to_shadow(shadow_cea_end);
- shadow_cea_end = (void *)round_up(
- (unsigned long)shadow_cea_end, PAGE_SIZE);
+ shadow_cea_begin = kasan_mem_to_shadow_align_down(CPU_ENTRY_AREA_BASE);
+ shadow_cea_end = kasan_mem_to_shadow_align_up(CPU_ENTRY_AREA_BASE +
+ CPU_ENTRY_AREA_MAP_SIZE);

kasan_populate_early_shadow(
kasan_mem_to_shadow((void *)PAGE_OFFSET + MAXMEM),
@@ -403,9 +407,9 @@ void __init kasan_init(void)

kasan_populate_early_shadow(
kasan_mem_to_shadow((void *)VMALLOC_END + 1),
- shadow_cea_begin);
+ (void *)shadow_cea_begin);

- kasan_populate_early_shadow(shadow_cea_end,
+ kasan_populate_early_shadow((void *)shadow_cea_end,
kasan_mem_to_shadow((void *)__START_KERNEL_map));

kasan_populate_shadow((unsigned long)kasan_mem_to_shadow(_stext),

2022-11-15 22:33:29

by tip-bot2 for Jacob Pan

[permalink] [raw]
Subject: [tip: x86/mm] x86/kasan: Populate shadow for shared chunk of the CPU entry area

The following commit has been merged into the x86/mm branch of tip:

Commit-ID: f2089aa0cd8e52564240a93ea1e4bb643c0ed34c
Gitweb: https://git.kernel.org/tip/f2089aa0cd8e52564240a93ea1e4bb643c0ed34c
Author: Sean Christopherson <[email protected]>
AuthorDate: Thu, 10 Nov 2022 20:35:04
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Tue, 15 Nov 2022 22:30:00 +01:00

x86/kasan: Populate shadow for shared chunk of the CPU entry area

Popuplate the shadow for the shared portion of the CPU entry area, i.e.
the read-only IDT mapping, during KASAN initialization. A recent change
modified KASAN to map the per-CPU areas on-demand, but forgot to keep a
shadow for the common area that is shared amongst all CPUs.

Map the common area in KASAN init instead of letting idt_map_in_cea() do
the dirty work so that it Just Works in the unlikely event more shared
data is shoved into the CPU entry area.

The bug manifests as a not-present #PF when software attempts to lookup
an IDT entry, e.g. when KVM is handling IRQs on Intel CPUs (KVM performs
direct CALL to the IRQ handler to avoid the overhead of INTn):

BUG: unable to handle page fault for address: fffffbc0000001d8
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 16c03a067 P4D 16c03a067 PUD 0
Oops: 0000 [#1] PREEMPT SMP KASAN
CPU: 5 PID: 901 Comm: repro Tainted: G W 6.1.0-rc3+ #410
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:kasan_check_range+0xdf/0x190
vmx_handle_exit_irqoff+0x152/0x290 [kvm_intel]
vcpu_run+0x1d89/0x2bd0 [kvm]
kvm_arch_vcpu_ioctl_run+0x3ce/0xa70 [kvm]
kvm_vcpu_ioctl+0x349/0x900 [kvm]
__x64_sys_ioctl+0xb8/0xf0
do_syscall_64+0x2b/0x50
entry_SYSCALL_64_after_hwframe+0x46/0xb0

Fixes: 9fd429c28073 ("x86/kasan: Map shadow for percpu pages on demand")
Reported-by: [email protected]
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
arch/x86/mm/kasan_init_64.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c
index afc5e12..0302491 100644
--- a/arch/x86/mm/kasan_init_64.c
+++ b/arch/x86/mm/kasan_init_64.c
@@ -341,7 +341,7 @@ void __init kasan_populate_shadow_for_vaddr(void *va, size_t size, int nid)

void __init kasan_init(void)
{
- unsigned long shadow_cea_begin, shadow_cea_end;
+ unsigned long shadow_cea_begin, shadow_cea_per_cpu_begin, shadow_cea_end;
int i;

memcpy(early_top_pgt, init_top_pgt, sizeof(early_top_pgt));
@@ -384,6 +384,7 @@ void __init kasan_init(void)
}

shadow_cea_begin = kasan_mem_to_shadow_align_down(CPU_ENTRY_AREA_BASE);
+ shadow_cea_per_cpu_begin = kasan_mem_to_shadow_align_up(CPU_ENTRY_AREA_PER_CPU);
shadow_cea_end = kasan_mem_to_shadow_align_up(CPU_ENTRY_AREA_BASE +
CPU_ENTRY_AREA_MAP_SIZE);

@@ -409,6 +410,15 @@ void __init kasan_init(void)
kasan_mem_to_shadow((void *)VMALLOC_END + 1),
(void *)shadow_cea_begin);

+ /*
+ * Populate the shadow for the shared portion of the CPU entry area.
+ * Shadows for the per-CPU areas are mapped on-demand, as each CPU's
+ * area is randomly placed somewhere in the 512GiB range and mapping
+ * the entire 512GiB range is prohibitively expensive.
+ */
+ kasan_populate_shadow(shadow_cea_begin,
+ shadow_cea_per_cpu_begin, 0);
+
kasan_populate_early_shadow((void *)shadow_cea_end,
kasan_mem_to_shadow((void *)__START_KERNEL_map));


2022-12-17 19:07:21

by tip-bot2 for Jacob Pan

[permalink] [raw]
Subject: [tip: x86/mm] x86/kasan: Add helpers to align shadow addresses up and down

The following commit has been merged into the x86/mm branch of tip:

Commit-ID: bde258d97409f2a45243cb393a55ea9ecfc7aba5
Gitweb: https://git.kernel.org/tip/bde258d97409f2a45243cb393a55ea9ecfc7aba5
Author: Sean Christopherson <[email protected]>
AuthorDate: Thu, 10 Nov 2022 20:35:03
Committer: Dave Hansen <[email protected]>
CommitterDate: Thu, 15 Dec 2022 10:37:28 -08:00

x86/kasan: Add helpers to align shadow addresses up and down

Add helpers to dedup code for aligning shadow address up/down to page
boundaries when translating an address to its shadow.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Andrey Ryabinin <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
arch/x86/mm/kasan_init_64.c | 40 +++++++++++++++++++-----------------
1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c
index ad7872a..afc5e12 100644
--- a/arch/x86/mm/kasan_init_64.c
+++ b/arch/x86/mm/kasan_init_64.c
@@ -316,22 +316,33 @@ void __init kasan_early_init(void)
kasan_map_early_shadow(init_top_pgt);
}

+static unsigned long kasan_mem_to_shadow_align_down(unsigned long va)
+{
+ unsigned long shadow = (unsigned long)kasan_mem_to_shadow((void *)va);
+
+ return round_down(shadow, PAGE_SIZE);
+}
+
+static unsigned long kasan_mem_to_shadow_align_up(unsigned long va)
+{
+ unsigned long shadow = (unsigned long)kasan_mem_to_shadow((void *)va);
+
+ return round_up(shadow, PAGE_SIZE);
+}
+
void __init kasan_populate_shadow_for_vaddr(void *va, size_t size, int nid)
{
unsigned long shadow_start, shadow_end;

- shadow_start = (unsigned long)kasan_mem_to_shadow(va);
- shadow_start = round_down(shadow_start, PAGE_SIZE);
- shadow_end = (unsigned long)kasan_mem_to_shadow(va + size);
- shadow_end = round_up(shadow_end, PAGE_SIZE);
-
+ shadow_start = kasan_mem_to_shadow_align_down((unsigned long)va);
+ shadow_end = kasan_mem_to_shadow_align_up((unsigned long)va + size);
kasan_populate_shadow(shadow_start, shadow_end, nid);
}

void __init kasan_init(void)
{
+ unsigned long shadow_cea_begin, shadow_cea_end;
int i;
- void *shadow_cea_begin, *shadow_cea_end;

memcpy(early_top_pgt, init_top_pgt, sizeof(early_top_pgt));

@@ -372,16 +383,9 @@ void __init kasan_init(void)
map_range(&pfn_mapped[i]);
}

- shadow_cea_begin = (void *)CPU_ENTRY_AREA_BASE;
- shadow_cea_begin = kasan_mem_to_shadow(shadow_cea_begin);
- shadow_cea_begin = (void *)round_down(
- (unsigned long)shadow_cea_begin, PAGE_SIZE);
-
- shadow_cea_end = (void *)(CPU_ENTRY_AREA_BASE +
- CPU_ENTRY_AREA_MAP_SIZE);
- shadow_cea_end = kasan_mem_to_shadow(shadow_cea_end);
- shadow_cea_end = (void *)round_up(
- (unsigned long)shadow_cea_end, PAGE_SIZE);
+ shadow_cea_begin = kasan_mem_to_shadow_align_down(CPU_ENTRY_AREA_BASE);
+ shadow_cea_end = kasan_mem_to_shadow_align_up(CPU_ENTRY_AREA_BASE +
+ CPU_ENTRY_AREA_MAP_SIZE);

kasan_populate_early_shadow(
kasan_mem_to_shadow((void *)PAGE_OFFSET + MAXMEM),
@@ -403,9 +407,9 @@ void __init kasan_init(void)

kasan_populate_early_shadow(
kasan_mem_to_shadow((void *)VMALLOC_END + 1),
- shadow_cea_begin);
+ (void *)shadow_cea_begin);

- kasan_populate_early_shadow(shadow_cea_end,
+ kasan_populate_early_shadow((void *)shadow_cea_end,
kasan_mem_to_shadow((void *)__START_KERNEL_map));

kasan_populate_shadow((unsigned long)kasan_mem_to_shadow(_stext),

2022-12-17 19:10:06

by tip-bot2 for Jacob Pan

[permalink] [raw]
Subject: [tip: x86/mm] x86/kasan: Populate shadow for shared chunk of the CPU entry area

The following commit has been merged into the x86/mm branch of tip:

Commit-ID: 1cfaac2400c73378e78182a706be0f3ac8b93cd7
Gitweb: https://git.kernel.org/tip/1cfaac2400c73378e78182a706be0f3ac8b93cd7
Author: Sean Christopherson <[email protected]>
AuthorDate: Thu, 10 Nov 2022 20:35:04
Committer: Dave Hansen <[email protected]>
CommitterDate: Thu, 15 Dec 2022 10:37:28 -08:00

x86/kasan: Populate shadow for shared chunk of the CPU entry area

Popuplate the shadow for the shared portion of the CPU entry area, i.e.
the read-only IDT mapping, during KASAN initialization. A recent change
modified KASAN to map the per-CPU areas on-demand, but forgot to keep a
shadow for the common area that is shared amongst all CPUs.

Map the common area in KASAN init instead of letting idt_map_in_cea() do
the dirty work so that it Just Works in the unlikely event more shared
data is shoved into the CPU entry area.

The bug manifests as a not-present #PF when software attempts to lookup
an IDT entry, e.g. when KVM is handling IRQs on Intel CPUs (KVM performs
direct CALL to the IRQ handler to avoid the overhead of INTn):

BUG: unable to handle page fault for address: fffffbc0000001d8
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 16c03a067 P4D 16c03a067 PUD 0
Oops: 0000 [#1] PREEMPT SMP KASAN
CPU: 5 PID: 901 Comm: repro Tainted: G W 6.1.0-rc3+ #410
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:kasan_check_range+0xdf/0x190
vmx_handle_exit_irqoff+0x152/0x290 [kvm_intel]
vcpu_run+0x1d89/0x2bd0 [kvm]
kvm_arch_vcpu_ioctl_run+0x3ce/0xa70 [kvm]
kvm_vcpu_ioctl+0x349/0x900 [kvm]
__x64_sys_ioctl+0xb8/0xf0
do_syscall_64+0x2b/0x50
entry_SYSCALL_64_after_hwframe+0x46/0xb0

Fixes: 9fd429c28073 ("x86/kasan: Map shadow for percpu pages on demand")
Reported-by: [email protected]
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
arch/x86/mm/kasan_init_64.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c
index afc5e12..0302491 100644
--- a/arch/x86/mm/kasan_init_64.c
+++ b/arch/x86/mm/kasan_init_64.c
@@ -341,7 +341,7 @@ void __init kasan_populate_shadow_for_vaddr(void *va, size_t size, int nid)

void __init kasan_init(void)
{
- unsigned long shadow_cea_begin, shadow_cea_end;
+ unsigned long shadow_cea_begin, shadow_cea_per_cpu_begin, shadow_cea_end;
int i;

memcpy(early_top_pgt, init_top_pgt, sizeof(early_top_pgt));
@@ -384,6 +384,7 @@ void __init kasan_init(void)
}

shadow_cea_begin = kasan_mem_to_shadow_align_down(CPU_ENTRY_AREA_BASE);
+ shadow_cea_per_cpu_begin = kasan_mem_to_shadow_align_up(CPU_ENTRY_AREA_PER_CPU);
shadow_cea_end = kasan_mem_to_shadow_align_up(CPU_ENTRY_AREA_BASE +
CPU_ENTRY_AREA_MAP_SIZE);

@@ -409,6 +410,15 @@ void __init kasan_init(void)
kasan_mem_to_shadow((void *)VMALLOC_END + 1),
(void *)shadow_cea_begin);

+ /*
+ * Populate the shadow for the shared portion of the CPU entry area.
+ * Shadows for the per-CPU areas are mapped on-demand, as each CPU's
+ * area is randomly placed somewhere in the 512GiB range and mapping
+ * the entire 512GiB range is prohibitively expensive.
+ */
+ kasan_populate_shadow(shadow_cea_begin,
+ shadow_cea_per_cpu_begin, 0);
+
kasan_populate_early_shadow((void *)shadow_cea_end,
kasan_mem_to_shadow((void *)__START_KERNEL_map));

2022-12-17 19:10:15

by tip-bot2 for Jacob Pan

[permalink] [raw]
Subject: [tip: x86/mm] x86/mm: Populate KASAN shadow for entire per-CPU range of CPU entry area

The following commit has been merged into the x86/mm branch of tip:

Commit-ID: 97650148a15e0b30099d6175ffe278b9f55ec66a
Gitweb: https://git.kernel.org/tip/97650148a15e0b30099d6175ffe278b9f55ec66a
Author: Sean Christopherson <[email protected]>
AuthorDate: Thu, 10 Nov 2022 20:35:01
Committer: Dave Hansen <[email protected]>
CommitterDate: Thu, 15 Dec 2022 10:37:28 -08:00

x86/mm: Populate KASAN shadow for entire per-CPU range of CPU entry area

Populate a KASAN shadow for the entire possible per-CPU range of the CPU
entry area instead of requiring that each individual chunk map a shadow.
Mapping shadows individually is error prone, e.g. the per-CPU GDT mapping
was left behind, which can lead to not-present page faults during KASAN
validation if the kernel performs a software lookup into the GDT. The DS
buffer is also likely affected.

The motivation for mapping the per-CPU areas on-demand was to avoid
mapping the entire 512GiB range that's reserved for the CPU entry area,
shaving a few bytes by not creating shadows for potentially unused memory
was not a goal.

The bug is most easily reproduced by doing a sigreturn with a garbage
CS in the sigcontext, e.g.

int main(void)
{
struct sigcontext regs;

syscall(__NR_mmap, 0x1ffff000ul, 0x1000ul, 0ul, 0x32ul, -1, 0ul);
syscall(__NR_mmap, 0x20000000ul, 0x1000000ul, 7ul, 0x32ul, -1, 0ul);
syscall(__NR_mmap, 0x21000000ul, 0x1000ul, 0ul, 0x32ul, -1, 0ul);

memset(&regs, 0, sizeof(regs));
regs.cs = 0x1d0;
syscall(__NR_rt_sigreturn);
return 0;
}

to coerce the kernel into doing a GDT lookup to compute CS.base when
reading the instruction bytes on the subsequent #GP to determine whether
or not the #GP is something the kernel should handle, e.g. to fixup UMIP
violations or to emulate CLI/STI for IOPL=3 applications.

BUG: unable to handle page fault for address: fffffbc8379ace00
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 16c03a067 P4D 16c03a067 PUD 15b990067 PMD 15b98f067 PTE 0
Oops: 0000 [#1] PREEMPT SMP KASAN
CPU: 3 PID: 851 Comm: r2 Not tainted 6.1.0-rc3-next-20221103+ #432
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:kasan_check_range+0xdf/0x190
Call Trace:
<TASK>
get_desc+0xb0/0x1d0
insn_get_seg_base+0x104/0x270
insn_fetch_from_user+0x66/0x80
fixup_umip_exception+0xb1/0x530
exc_general_protection+0x181/0x210
asm_exc_general_protection+0x22/0x30
RIP: 0003:0x0
Code: Unable to access opcode bytes at 0xffffffffffffffd6.
RSP: 0003:0000000000000000 EFLAGS: 00000202
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000000001d0
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
</TASK>

Fixes: 9fd429c28073 ("x86/kasan: Map shadow for percpu pages on demand")
Reported-by: [email protected]
Suggested-by: Andrey Ryabinin <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Andrey Ryabinin <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
arch/x86/mm/cpu_entry_area.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/x86/mm/cpu_entry_area.c b/arch/x86/mm/cpu_entry_area.c
index d831aae..7c855df 100644
--- a/arch/x86/mm/cpu_entry_area.c
+++ b/arch/x86/mm/cpu_entry_area.c
@@ -91,11 +91,6 @@ void cea_set_pte(void *cea_vaddr, phys_addr_t pa, pgprot_t flags)
static void __init
cea_map_percpu_pages(void *cea_vaddr, void *ptr, int pages, pgprot_t prot)
{
- phys_addr_t pa = per_cpu_ptr_to_phys(ptr);
-
- kasan_populate_shadow_for_vaddr(cea_vaddr, pages * PAGE_SIZE,
- early_pfn_to_nid(PFN_DOWN(pa)));
-
for ( ; pages; pages--, cea_vaddr+= PAGE_SIZE, ptr += PAGE_SIZE)
cea_set_pte(cea_vaddr, per_cpu_ptr_to_phys(ptr), prot);
}
@@ -195,6 +190,9 @@ static void __init setup_cpu_entry_area(unsigned int cpu)
pgprot_t tss_prot = PAGE_KERNEL;
#endif

+ kasan_populate_shadow_for_vaddr(cea, CPU_ENTRY_AREA_SIZE,
+ early_cpu_to_node(cpu));
+
cea_set_pte(&cea->gdt, get_cpu_gdt_paddr(cpu), gdt_prot);

cea_map_percpu_pages(&cea->entry_stack_page,