2018-06-07 17:35:01

by Yu-cheng Yu

[permalink] [raw]
Subject: [PATCH 0/9] Control Flow Enforcement - Part (2)

Summary of changes:

Shadow stack kernel config option;
Control protection exception; and
Shadow stack memory management.

The shadow stack PTE needs to be read-only and dirty. Changes
are made to:

Use the read-only and hardware dirty combination exclusively
for shadow stack;

Use a PTE spare bit to indicate other PTE dirty conditions;

Shadow stack page fault handling.

Yu-cheng Yu (9):
x86/cet: Control protection exception handler
x86/cet: Add Kconfig option for user-mode shadow stack
mm: Introduce VM_SHSTK for shadow stack memory
x86/mm: Change _PAGE_DIRTY to _PAGE_DIRTY_HW
x86/mm: Introduce _PAGE_DIRTY_SW
x86/mm: Introduce ptep_set_wrprotect_flush and related functions
x86/mm: Shadow stack page fault error checking
x86/cet: Handle shadow stack page fault
x86/cet: Handle THP/HugeTLB shadow stack page copying

arch/x86/Kconfig | 24 ++++++
arch/x86/entry/entry_32.S | 5 ++
arch/x86/entry/entry_64.S | 2 +-
arch/x86/include/asm/pgtable.h | 149 ++++++++++++++++++++++++++++++-----
arch/x86/include/asm/pgtable_types.h | 31 +++++---
arch/x86/include/asm/traps.h | 5 ++
arch/x86/kernel/idt.c | 1 +
arch/x86/kernel/relocate_kernel_64.S | 2 +-
arch/x86/kernel/traps.c | 61 ++++++++++++++
arch/x86/kvm/vmx.c | 2 +-
arch/x86/mm/fault.c | 11 +++
include/asm-generic/pgtable.h | 38 +++++++++
include/linux/mm.h | 8 ++
mm/huge_memory.c | 10 ++-
mm/hugetlb.c | 2 +-
mm/internal.h | 8 ++
mm/memory.c | 32 +++++++-
17 files changed, 353 insertions(+), 38 deletions(-)

--
2.15.1



2018-06-07 15:45:43

by Yu-cheng Yu

[permalink] [raw]
Subject: [PATCH 8/9] x86/cet: Handle shadow stack page fault

When a task does fork(), its shadow stack must be duplicated for
the child. However, the child may not actually use all pages of
of the copied shadow stack. This patch implements a flow that
is similar to copy-on-write of an anonymous page, but for shadow
stack memory. A shadow stack PTE needs to be RO and dirty. We
use this dirty bit requirement to effect the copying of shadow
stack pages.

In copy_one_pte(), we clear the dirty bit from the shadow stack
PTE. On the next shadow stack access to the PTE, a page fault
occurs. At that time, we then copy/re-use the page and fix the
PTE.

Signed-off-by: Yu-cheng Yu <[email protected]>
---
mm/memory.c | 32 +++++++++++++++++++++++++++++---
1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 01f5464e0fd2..275c7fb3fc96 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1022,7 +1022,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
* in the parent and the child
*/
if (is_cow_mapping(vm_flags)) {
- ptep_set_wrprotect(src_mm, addr, src_pte);
+ ptep_set_wrprotect_flush(vma, addr, src_pte);
pte = pte_wrprotect(pte);
}

@@ -2444,7 +2444,13 @@ static inline void wp_page_reuse(struct vm_fault *vmf)

flush_cache_page(vma, vmf->address, pte_pfn(vmf->orig_pte));
entry = pte_mkyoung(vmf->orig_pte);
- entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+
+ if (is_shstk_mapping(vma->vm_flags))
+ entry = pte_mkdirty_shstk(entry);
+ else
+ entry = pte_mkdirty(entry);
+
+ entry = maybe_mkwrite(entry, vma);
if (ptep_set_access_flags(vma, vmf->address, vmf->pte, entry, 1))
update_mmu_cache(vma, vmf->address, vmf->pte);
pte_unmap_unlock(vmf->pte, vmf->ptl);
@@ -2517,7 +2523,11 @@ static int wp_page_copy(struct vm_fault *vmf)
}
flush_cache_page(vma, vmf->address, pte_pfn(vmf->orig_pte));
entry = mk_pte(new_page, vma->vm_page_prot);
- entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+ if (is_shstk_mapping(vma->vm_flags))
+ entry = pte_mkdirty_shstk(entry);
+ else
+ entry = pte_mkdirty(entry);
+ entry = maybe_mkwrite(entry, vma);
/*
* Clear the pte entry and flush it first, before updating the
* pte with the new entry. This will avoid a race condition
@@ -3192,6 +3202,14 @@ static int do_anonymous_page(struct vm_fault *vmf)
mem_cgroup_commit_charge(page, memcg, false, false);
lru_cache_add_active_or_unevictable(page, vma);
setpte:
+ /*
+ * If this is within a shadow stack mapping, mark
+ * the PTE dirty. We don't use pte_mkdirty(),
+ * because the PTE must have _PAGE_DIRTY_HW set.
+ */
+ if (is_shstk_mapping(vma->vm_flags))
+ entry = pte_mkdirty_shstk(entry);
+
set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry);

/* No need to invalidate - it was non-present before */
@@ -3974,6 +3992,14 @@ static int handle_pte_fault(struct vm_fault *vmf)
entry = vmf->orig_pte;
if (unlikely(!pte_same(*vmf->pte, entry)))
goto unlock;
+
+ /*
+ * Shadow stack PTEs are copy-on-access, so do_wp_page()
+ * handling on them no matter if we have write fault or not.
+ */
+ if (is_shstk_mapping(vmf->vma->vm_flags))
+ return do_wp_page(vmf);
+
if (vmf->flags & FAULT_FLAG_WRITE) {
if (!pte_write(entry))
return do_wp_page(vmf);
--
2.15.1


2018-06-07 15:45:46

by Yu-cheng Yu

[permalink] [raw]
Subject: [PATCH 9/9] x86/cet: Handle THP/HugeTLB shadow stack page copying

This patch implements THP shadow stack memory copying in the same
way as the previous patch for regular PTE.

In copy_huge_pmd(), we clear the dirty bit from the PMD. On the
next shadow stack access to the PMD, a page fault occurs. At
that time, the page is copied/re-used and the PMD is fixed.

Signed-off-by: Yu-cheng Yu <[email protected]>
---
mm/huge_memory.c | 10 +++++++++-
mm/hugetlb.c | 2 +-
2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index a3a1815f8e11..c6e72ccc4274 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -600,6 +600,8 @@ static int __do_huge_pmd_anonymous_page(struct vm_fault *vmf, struct page *page,

entry = mk_huge_pmd(page, vma->vm_page_prot);
entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
+ if (is_shstk_mapping(vma->vm_flags))
+ entry = pmd_mkdirty_shstk(entry);
page_add_new_anon_rmap(page, vma, haddr, true);
mem_cgroup_commit_charge(page, memcg, false, true);
lru_cache_add_active_or_unevictable(page, vma);
@@ -976,7 +978,7 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
mm_inc_nr_ptes(dst_mm);
pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable);

- pmdp_set_wrprotect(src_mm, addr, src_pmd);
+ pmdp_set_wrprotect_flush(vma, addr, src_pmd);
pmd = pmd_mkold(pmd_wrprotect(pmd));
set_pmd_at(dst_mm, addr, dst_pmd, pmd);

@@ -1196,6 +1198,8 @@ static int do_huge_pmd_wp_page_fallback(struct vm_fault *vmf, pmd_t orig_pmd,
pte_t entry;
entry = mk_pte(pages[i], vma->vm_page_prot);
entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+ if (is_shstk_mapping(vma->vm_flags))
+ entry = pte_mkdirty_shstk(entry);
memcg = (void *)page_private(pages[i]);
set_page_private(pages[i], 0);
page_add_new_anon_rmap(pages[i], vmf->vma, haddr, false);
@@ -1280,6 +1284,8 @@ int do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd)
pmd_t entry;
entry = pmd_mkyoung(orig_pmd);
entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
+ if (is_shstk_mapping(vma->vm_flags))
+ entry = pmd_mkdirty_shstk(entry);
if (pmdp_set_access_flags(vma, haddr, vmf->pmd, entry, 1))
update_mmu_cache_pmd(vma, vmf->address, vmf->pmd);
ret |= VM_FAULT_WRITE;
@@ -1350,6 +1356,8 @@ int do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd)
pmd_t entry;
entry = mk_huge_pmd(new_page, vma->vm_page_prot);
entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
+ if (is_shstk_mapping(vma->vm_flags))
+ entry = pmd_mkdirty_shstk(entry);
pmdp_huge_clear_flush_notify(vma, haddr, vmf->pmd);
page_add_new_anon_rmap(new_page, vma, haddr, true);
mem_cgroup_commit_charge(new_page, memcg, false, true);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 218679138255..d694cfab9f90 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3293,7 +3293,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
*
* See Documentation/vm/mmu_notifier.txt
*/
- huge_ptep_set_wrprotect(src, addr, src_pte);
+ huge_ptep_set_wrprotect_flush(vma, addr, src_pte);
}
entry = huge_ptep_get(src_pte);
ptepage = pte_page(entry);
--
2.15.1


2018-06-07 15:46:34

by Yu-cheng Yu

[permalink] [raw]
Subject: [PATCH 6/9] x86/mm: Introduce ptep_set_wrprotect_flush and related functions

The function ptep_set_wrprotect()/huge_ptep_set_wrprotect() is
used by copy_page_range()/copy_hugetlb_page_range() to copy
PTEs.

On x86, when the shadow stack is enabled, only a shadow stack
PTE has the read-only and _PAGE_DIRTY_HW combination. Upon
making a dirty PTE read-only, we move its _PAGE_DIRTY_HW to
_PAGE_DIRTY_SW.

When ptep_set_wrprotect() moves _PAGE_DIRTY_HW to _PAGE_DIRTY_SW,
if the PTE is writable and the mm is shared, another task could
race to set _PAGE_DIRTY_HW again.

Introduce ptep_set_wrprotect_flush(), pmdp_set_wrprotect_flush(),
and huge_ptep_set_wrprotect_flush() to make sure this does not
happen.

Signed-off-by: Yu-cheng Yu <[email protected]>
---
arch/x86/include/asm/pgtable.h | 56 +++++++++++++++++++++++++++++++++++-------
include/asm-generic/pgtable.h | 26 ++++++++++++++++++++
2 files changed, 73 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 0996f8a6979a..1053b940b35c 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1148,11 +1148,27 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm,
return pte;
}

-#define __HAVE_ARCH_PTEP_SET_WRPROTECT
-static inline void ptep_set_wrprotect(struct mm_struct *mm,
- unsigned long addr, pte_t *ptep)
-{
- clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte);
+#define __HAVE_ARCH_PTEP_SET_WRPROTECT_FLUSH
+extern pte_t ptep_clear_flush(struct vm_area_struct *vma,
+ unsigned long address,
+ pte_t *ptep);
+static inline void ptep_set_wrprotect_flush(struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep)
+{
+ bool rw;
+
+ rw = test_and_clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte);
+ if (IS_ENABLED(CONFIG_X86_INTEL_SHADOW_STACK_USER)) {
+ struct mm_struct *mm = vma->vm_mm;
+ pte_t pte;
+
+ if (rw && (atomic_read(&mm->mm_users) > 1))
+ pte = ptep_clear_flush(vma, addr, ptep);
+ else
+ pte = *ptep;
+ pte = pte_move_flags(pte, _PAGE_DIRTY_HW, _PAGE_DIRTY_SW);
+ set_pte_at(mm, addr, ptep, pte);
+ }
}

#define flush_tlb_fix_spurious_fault(vma, address) do { } while (0)
@@ -1198,11 +1214,33 @@ static inline pud_t pudp_huge_get_and_clear(struct mm_struct *mm,
return native_pudp_get_and_clear(pudp);
}

-#define __HAVE_ARCH_PMDP_SET_WRPROTECT
-static inline void pmdp_set_wrprotect(struct mm_struct *mm,
- unsigned long addr, pmd_t *pmdp)
+#define __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT_FLUSH
+static inline void huge_ptep_set_wrprotect_flush(struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep)
{
- clear_bit(_PAGE_BIT_RW, (unsigned long *)pmdp);
+ ptep_set_wrprotect_flush(vma, addr, ptep);
+}
+
+#define __HAVE_ARCH_PMDP_SET_WRPROTECT_FLUSH
+extern pmd_t pmdp_huge_clear_flush(struct vm_area_struct *vma,
+ unsigned long address,
+ pmd_t *pmdp);
+static inline void pmdp_set_wrprotect_flush(struct vm_area_struct *vma,
+ unsigned long addr, pmd_t *pmdp)
+{ bool rw;
+
+ rw = test_and_clear_bit(_PAGE_BIT_RW, (unsigned long *)&pmdp);
+ if (IS_ENABLED(CONFIG_X86_INTEL_SHADOW_STACK_USER)) {
+ struct mm_struct *mm = vma->vm_mm;
+ pmd_t pmd;
+
+ if (rw && (atomic_read(&mm->mm_users) > 1))
+ pmd = pmdp_huge_clear_flush(vma, addr, pmdp);
+ else
+ pmd = *pmdp;
+ pmd = pmd_move_flags(pmd, _PAGE_DIRTY_HW, _PAGE_DIRTY_SW);
+ set_pmd_at(mm, addr, pmdp, pmd);
+ }
}

#define pud_write pud_write
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 3f6f998509f0..9bcfdfc045bb 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -121,6 +121,15 @@ static inline int pmdp_clear_flush_young(struct vm_area_struct *vma,
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
#endif

+#ifndef __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT_FLUSH
+static inline void huge_ptep_set_wrorptect_flush(struct vm_area_struct *vma,
+ unsigned long addr,
+ pte_t *ptep)
+{
+ huge_ptep_set_wrprotect(vma->vm_mm, addr, ptep);
+}
+#endif
+
#ifndef __HAVE_ARCH_PTEP_GET_AND_CLEAR
static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
unsigned long address,
@@ -226,6 +235,15 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addres
}
#endif

+#ifndef __HAVE_ARCH_PTEP_SET_WRPROTECT_FLUSH
+static inline void ptep_set_wrprotect_flush(struct vm_area_struct *vma,
+ unsigned long address,
+ pte_t *ptep)
+{
+ ptep_set_wrprotect(vma->vm_mm, address, ptep);
+}
+#endif
+
#ifndef pte_savedwrite
#define pte_savedwrite pte_write
#endif
@@ -266,6 +284,14 @@ static inline void pmdp_set_wrprotect(struct mm_struct *mm,
}
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
#endif
+#ifndef __HAVE_ARCH_PMDP_SET_WRPROTECT_FLUSH
+static inline void pmdp_set_wrprotect_flush(struct vm_area_struct *vma,
+ unsigned long address,
+ pmd_t *pmdp)
+{
+ pmdp_set_wrprotect(vma->vm_mm, address, pmdp);
+}
+#endif
#ifndef __HAVE_ARCH_PUDP_SET_WRPROTECT
#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
static inline void pudp_set_wrprotect(struct mm_struct *mm,
--
2.15.1


2018-06-07 15:47:21

by Yu-cheng Yu

[permalink] [raw]
Subject: [PATCH 5/9] x86/mm: Introduce _PAGE_DIRTY_SW

The PTE DIRTY bit indicates a few conditions:

(1) When the processor writes to a memory page, the page's
PTE is (R/W + _PAGE_DIRTY_HW);
(2) When a modified page is shared from fork(), its PTE is
(R/O + _PAGE_DIRTY_HW);
(3) When access_remote_vm() has tried to write to a read-
only page with (FOLL_FORCE | FOLL_WRITE), the PTE is
(R/O + _PAGE_DIRTY_HW);
(4) A shadow stack memory page is required to be set as
(R/O + _PAGE_DIRTY_HW);

In case (1) above, the DIRTY bit is set by the processor;
for other cases, it is set by the software. However, the
processor reads the DIRTY bit only in case (4) for ensuring
a valid shadow stack page.

To make (R/O + _PAGE_DIRTY_HW) exclusively for shadow stack,
we introduce _PAGE_BIT_DIRTY_SW, a spare bit of the 64-bit
PTE, to replace _PAGE_BIT_DIRTY for case (2), (3) and (4).

This results to the following possible PTE settings:

Modified PTE: (R/W + _PAGE_DIRTY_HW)
Modified and shared PTE: (R/O + _PAGE_DIRTY_SW)
R/O PTE was (FOLL_FORCE | FOLL_WRITE): (R/O + _PAGE_DIRTY_SW)
Shadow stack PTE: (R/O + _PAGE_DIRTY_HW)
Shared shadow stack PTE: (R/O + _PAGE_DIRTY_SW)

Note that _PAGE_BIT_DRITY_SW is only used in R/O PTEs but
not R/W PTEs.

When this patch is applied, there are six free bits left in
the 64-bit PTE. There is no more free bit in the 32-bit
PTE (except for PAE) and shadow stack is not implemented
for the 32-bit kernel.

Signed-off-by: Yu-cheng Yu <[email protected]>
---
arch/x86/include/asm/pgtable.h | 91 ++++++++++++++++++++++++++++++++----
arch/x86/include/asm/pgtable_types.h | 14 +++++-
include/asm-generic/pgtable.h | 12 +++++
3 files changed, 105 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 00b5e79c09a6..0996f8a6979a 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -116,9 +116,9 @@ extern pmdval_t early_pmd_flags;
* The following only work if pte_present() is true.
* Undefined behaviour if not..
*/
-static inline int pte_dirty(pte_t pte)
+static inline bool pte_dirty(pte_t pte)
{
- return pte_flags(pte) & _PAGE_DIRTY;
+ return pte_flags(pte) & _PAGE_DIRTY_BITS;
}


@@ -140,9 +140,9 @@ static inline int pte_young(pte_t pte)
return pte_flags(pte) & _PAGE_ACCESSED;
}

-static inline int pmd_dirty(pmd_t pmd)
+static inline bool pmd_dirty(pmd_t pmd)
{
- return pmd_flags(pmd) & _PAGE_DIRTY;
+ return pmd_flags(pmd) & _PAGE_DIRTY_BITS;
}

static inline int pmd_young(pmd_t pmd)
@@ -150,9 +150,9 @@ static inline int pmd_young(pmd_t pmd)
return pmd_flags(pmd) & _PAGE_ACCESSED;
}

-static inline int pud_dirty(pud_t pud)
+static inline bool pud_dirty(pud_t pud)
{
- return pud_flags(pud) & _PAGE_DIRTY;
+ return pud_flags(pud) & _PAGE_DIRTY_BITS;
}

static inline int pud_young(pud_t pud)
@@ -281,9 +281,23 @@ static inline pte_t pte_clear_flags(pte_t pte, pteval_t clear)
return native_make_pte(v & ~clear);
}

+#if defined(CONFIG_X86_INTEL_SHADOW_STACK_USER)
+static inline pte_t pte_move_flags(pte_t pte, pteval_t from, pteval_t to)
+{
+ if (pte_flags(pte) & from)
+ pte = pte_set_flags(pte_clear_flags(pte, from), to);
+ return pte;
+}
+#else
+static inline pte_t pte_move_flags(pte_t pte, pteval_t from, pteval_t to)
+{
+ return pte;
+}
+#endif
+
static inline pte_t pte_mkclean(pte_t pte)
{
- return pte_clear_flags(pte, _PAGE_DIRTY);
+ return pte_clear_flags(pte, _PAGE_DIRTY_BITS);
}

static inline pte_t pte_mkold(pte_t pte)
@@ -293,6 +307,7 @@ static inline pte_t pte_mkold(pte_t pte)

static inline pte_t pte_wrprotect(pte_t pte)
{
+ pte = pte_move_flags(pte, _PAGE_DIRTY_HW, _PAGE_DIRTY_SW);
return pte_clear_flags(pte, _PAGE_RW);
}

@@ -302,9 +317,18 @@ static inline pte_t pte_mkexec(pte_t pte)
}

static inline pte_t pte_mkdirty(pte_t pte)
+{
+ pteval_t dirty = (!IS_ENABLED(CONFIG_X86_INTEL_SHSTK_USER) ||
+ pte_write(pte)) ? _PAGE_DIRTY_HW:_PAGE_DIRTY_SW;
+ return pte_set_flags(pte, dirty | _PAGE_SOFT_DIRTY);
+}
+
+#ifdef CONFIG_ARCH_HAS_SHSTK
+static inline pte_t pte_mkdirty_shstk(pte_t pte)
{
return pte_set_flags(pte, _PAGE_DIRTY_HW | _PAGE_SOFT_DIRTY);
}
+#endif

static inline pte_t pte_mkyoung(pte_t pte)
{
@@ -313,6 +337,7 @@ static inline pte_t pte_mkyoung(pte_t pte)

static inline pte_t pte_mkwrite(pte_t pte)
{
+ pte = pte_move_flags(pte, _PAGE_DIRTY_SW, _PAGE_DIRTY_HW);
return pte_set_flags(pte, _PAGE_RW);
}

@@ -360,6 +385,20 @@ static inline pmd_t pmd_clear_flags(pmd_t pmd, pmdval_t clear)
return native_make_pmd(v & ~clear);
}

+#if defined(CONFIG_X86_INTEL_SHADOW_STACK_USER)
+static inline pmd_t pmd_move_flags(pmd_t pmd, pmdval_t from, pmdval_t to)
+{
+ if (pmd_flags(pmd) & from)
+ pmd = pmd_set_flags(pmd_clear_flags(pmd, from), to);
+ return pmd;
+}
+#else
+static inline pmd_t pmd_move_flags(pmd_t pmd, pmdval_t from, pmdval_t to)
+{
+ return pmd;
+}
+#endif
+
static inline pmd_t pmd_mkold(pmd_t pmd)
{
return pmd_clear_flags(pmd, _PAGE_ACCESSED);
@@ -367,18 +406,29 @@ static inline pmd_t pmd_mkold(pmd_t pmd)

static inline pmd_t pmd_mkclean(pmd_t pmd)
{
- return pmd_clear_flags(pmd, _PAGE_DIRTY);
+ return pmd_clear_flags(pmd, _PAGE_DIRTY_BITS);
}

static inline pmd_t pmd_wrprotect(pmd_t pmd)
{
+ pmd = pmd_move_flags(pmd, _PAGE_DIRTY_HW, _PAGE_DIRTY_SW);
return pmd_clear_flags(pmd, _PAGE_RW);
}

static inline pmd_t pmd_mkdirty(pmd_t pmd)
+{
+ pmdval_t dirty = (!IS_ENABLED(CONFIG_X86_INTEL_SHSTK_USER) ||
+ (pmd_flags(pmd) & _PAGE_RW)) ?
+ _PAGE_DIRTY_HW:_PAGE_DIRTY_SW;
+ return pmd_set_flags(pmd, dirty | _PAGE_SOFT_DIRTY);
+}
+
+#ifdef CONFIG_ARCH_HAS_SHSTK
+static inline pmd_t pmd_mkdirty_shstk(pmd_t pmd)
{
return pmd_set_flags(pmd, _PAGE_DIRTY_HW | _PAGE_SOFT_DIRTY);
}
+#endif

static inline pmd_t pmd_mkdevmap(pmd_t pmd)
{
@@ -397,6 +447,7 @@ static inline pmd_t pmd_mkyoung(pmd_t pmd)

static inline pmd_t pmd_mkwrite(pmd_t pmd)
{
+ pmd = pmd_move_flags(pmd, _PAGE_DIRTY_SW, _PAGE_DIRTY_HW);
return pmd_set_flags(pmd, _PAGE_RW);
}

@@ -419,6 +470,20 @@ static inline pud_t pud_clear_flags(pud_t pud, pudval_t clear)
return native_make_pud(v & ~clear);
}

+#if defined(CONFIG_X86_INTEL_SHADOW_STACK_USER)
+static inline pud_t pud_move_flags(pud_t pud, pudval_t from, pudval_t to)
+{
+ if (pud_flags(pud) & from)
+ pud = pud_set_flags(pud_clear_flags(pud, from), to);
+ return pud;
+}
+#else
+static inline pud_t pud_move_flags(pud_t pud, pudval_t from, pudval_t to)
+{
+ return pud;
+}
+#endif
+
static inline pud_t pud_mkold(pud_t pud)
{
return pud_clear_flags(pud, _PAGE_ACCESSED);
@@ -426,17 +491,22 @@ static inline pud_t pud_mkold(pud_t pud)

static inline pud_t pud_mkclean(pud_t pud)
{
- return pud_clear_flags(pud, _PAGE_DIRTY);
+ return pud_clear_flags(pud, _PAGE_DIRTY_BITS);
}

static inline pud_t pud_wrprotect(pud_t pud)
{
+ pud = pud_move_flags(pud, _PAGE_DIRTY_HW, _PAGE_DIRTY_SW);
return pud_clear_flags(pud, _PAGE_RW);
}

static inline pud_t pud_mkdirty(pud_t pud)
{
- return pud_set_flags(pud, _PAGE_DIRTY_HW | _PAGE_SOFT_DIRTY);
+ pudval_t dirty = (!IS_ENABLED(CONFIG_X86_INTEL_SHSTK_USER) ||
+ (pud_flags(pud) & _PAGE_RW)) ?
+ _PAGE_DIRTY_HW:_PAGE_DIRTY_SW;
+
+ return pud_set_flags(pud, dirty | _PAGE_SOFT_DIRTY);
}

static inline pud_t pud_mkdevmap(pud_t pud)
@@ -456,6 +526,7 @@ static inline pud_t pud_mkyoung(pud_t pud)

static inline pud_t pud_mkwrite(pud_t pud)
{
+ pud = pud_move_flags(pud, _PAGE_DIRTY_SW, _PAGE_DIRTY_HW);
return pud_set_flags(pud, _PAGE_RW);
}

diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index 2ac5d46d7c49..0907adb56197 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -23,6 +23,7 @@
#define _PAGE_BIT_SOFTW2 10 /* " */
#define _PAGE_BIT_SOFTW3 11 /* " */
#define _PAGE_BIT_PAT_LARGE 12 /* On 2MB or 1GB pages */
+#define _PAGE_BIT_SOFTW5 57 /* available for programmer */
#define _PAGE_BIT_SOFTW4 58 /* available for programmer */
#define _PAGE_BIT_PKEY_BIT0 59 /* Protection Keys, bit 1/4 */
#define _PAGE_BIT_PKEY_BIT1 60 /* Protection Keys, bit 2/4 */
@@ -34,6 +35,7 @@
#define _PAGE_BIT_CPA_TEST _PAGE_BIT_SOFTW1
#define _PAGE_BIT_SOFT_DIRTY _PAGE_BIT_SOFTW3 /* software dirty tracking */
#define _PAGE_BIT_DEVMAP _PAGE_BIT_SOFTW4
+#define _PAGE_BIT_DIRTY_SW _PAGE_BIT_SOFTW5 /* was written to */

/* If _PAGE_BIT_PRESENT is clear, we use these: */
/* - if the user mapped it with PROT_NONE; pte_present gives true */
@@ -109,6 +111,14 @@
#define _PAGE_DEVMAP (_AT(pteval_t, 0))
#endif

+#if defined(CONFIG_X86_INTEL_SHADOW_STACK_USER)
+#define _PAGE_DIRTY_SW (_AT(pteval_t, 1) << _PAGE_BIT_DIRTY_SW)
+#else
+#define _PAGE_DIRTY_SW (_AT(pteval_t, 0))
+#endif
+
+#define _PAGE_DIRTY_BITS (_PAGE_DIRTY_HW | _PAGE_DIRTY_SW)
+
#define _PAGE_PROTNONE (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE)

#define _PAGE_TABLE_NOENC (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER |\
@@ -122,9 +132,9 @@
* instance, and is *not* included in this mask since
* pte_modify() does modify it.
*/
-#define _PAGE_CHG_MASK (PTE_PFN_MASK | _PAGE_PCD | _PAGE_PWT | \
+#define _PAGE_CHG_MASK (PTE_PFN_MASK | _PAGE_PCD | _PAGE_PWT | \
_PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY_HW | \
- _PAGE_SOFT_DIRTY)
+ _PAGE_DIRTY_SW | _PAGE_SOFT_DIRTY)
#define _HPAGE_CHG_MASK (_PAGE_CHG_MASK | _PAGE_PSE)

/*
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index f59639afaa39..3f6f998509f0 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -1097,4 +1097,16 @@ static inline void init_espfix_bsp(void) { }
#endif
#endif

+#ifndef CONFIG_ARCH_HAS_SHSTK
+static inline pte_t pte_mkdirty_shstk(pte_t pte)
+{
+ return pte;
+}
+
+static inline pmd_t pmd_mkdirty_shstk(pmd_t pmd)
+{
+ return pmd;
+}
+#endif
+
#endif /* _ASM_GENERIC_PGTABLE_H */
--
2.15.1


2018-06-07 15:47:40

by Yu-cheng Yu

[permalink] [raw]
Subject: [PATCH 3/9] mm: Introduce VM_SHSTK for shadow stack memory

VM_SHSTK indicates a shadow stack memory area.

A shadow stack PTE must be read-only and dirty. For non shadow
stack, we use a spare bit of the 64-bit PTE for dirty. The PTE
changes are in the next patch.

There is no more spare bit in the 32-bit PTE (except for PAE) and
the shadow stack is not implemented for the 32-bit kernel.

Signed-off-by: Yu-cheng Yu <[email protected]>
---
include/linux/mm.h | 8 ++++++++
mm/internal.h | 8 ++++++++
2 files changed, 16 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 02a616e2f17d..bf4388a8cc41 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -221,11 +221,13 @@ extern unsigned int kobjsize(const void *objp);
#define VM_HIGH_ARCH_BIT_2 34 /* bit only usable on 64-bit architectures */
#define VM_HIGH_ARCH_BIT_3 35 /* bit only usable on 64-bit architectures */
#define VM_HIGH_ARCH_BIT_4 36 /* bit only usable on 64-bit architectures */
+#define VM_HIGH_ARCH_BIT_5 37 /* bit only usable on 64-bit architectures */
#define VM_HIGH_ARCH_0 BIT(VM_HIGH_ARCH_BIT_0)
#define VM_HIGH_ARCH_1 BIT(VM_HIGH_ARCH_BIT_1)
#define VM_HIGH_ARCH_2 BIT(VM_HIGH_ARCH_BIT_2)
#define VM_HIGH_ARCH_3 BIT(VM_HIGH_ARCH_BIT_3)
#define VM_HIGH_ARCH_4 BIT(VM_HIGH_ARCH_BIT_4)
+#define VM_HIGH_ARCH_5 BIT(VM_HIGH_ARCH_BIT_5)
#endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */

#if defined(CONFIG_X86)
@@ -257,6 +259,12 @@ extern unsigned int kobjsize(const void *objp);
# define VM_MPX VM_NONE
#endif

+#ifdef CONFIG_X86_INTEL_SHADOW_STACK_USER
+# define VM_SHSTK VM_HIGH_ARCH_5
+#else
+# define VM_SHSTK VM_NONE
+#endif
+
#ifndef VM_GROWSUP
# define VM_GROWSUP VM_NONE
#endif
diff --git a/mm/internal.h b/mm/internal.h
index 502d14189794..44c64711a309 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -280,6 +280,14 @@ static inline bool is_data_mapping(vm_flags_t flags)
return (flags & (VM_WRITE | VM_SHARED | VM_STACK)) == VM_WRITE;
}

+/*
+ * Shadow stack area
+ */
+static inline bool is_shstk_mapping(vm_flags_t flags)
+{
+ return (flags & VM_SHSTK);
+}
+
/* mm/util.c */
void __vma_link_list(struct mm_struct *mm, struct vm_area_struct *vma,
struct vm_area_struct *prev, struct rb_node *rb_parent);
--
2.15.1


2018-06-07 15:48:04

by Yu-cheng Yu

[permalink] [raw]
Subject: [PATCH 2/9] x86/cet: Add Kconfig option for user-mode shadow stack

Introduce Kconfig option X86_INTEL_SHADOW_STACK_USER.

An application has shadow stack protection when all the following are
true:

(1) The kernel has X86_INTEL_SHADOW_STACK_USER enabled,
(2) The running processor supports the shadow stack,
(3) The application is built with shadow stack enabled tools & libs
and, and at runtime, all dependent shared libs can support shadow
stack.

If this kernel config option is enabled, but (2) or (3) above is not
true, the application runs without the shadow stack protection.
Existing legacy applications will continue to work without the shadow
stack protection.

The user-mode shadow stack protection is only implemented for the
64-bit kernel. Thirty-two bit applications are supported under the
compatibility mode.

Signed-off-by: Yu-cheng Yu <[email protected]>
---
arch/x86/Kconfig | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index c07f492b871a..dd580d4910fc 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1925,6 +1925,30 @@ config X86_INTEL_MEMORY_PROTECTION_KEYS

If unsure, say y.

+config X86_INTEL_CET
+ def_bool n
+
+config ARCH_HAS_SHSTK
+ def_bool n
+
+config X86_INTEL_SHADOW_STACK_USER
+ prompt "Intel Shadow Stack for user-mode"
+ def_bool n
+ depends on CPU_SUP_INTEL && X86_64
+ select X86_INTEL_CET
+ select ARCH_HAS_SHSTK
+ ---help---
+ Shadow stack provides hardware protection against program stack
+ corruption. Only when all the following are true will an application
+ have the shadow stack protection: the kernel supports it (i.e. this
+ feature is enabled), the application is compiled and linked with
+ shadow stack enabled, and the processor supports this feature.
+ When the kernel has this configuration enabled, existing non shadow
+ stack applications will continue to work, but without shadow stack
+ protection.
+
+ If unsure, say y.
+
config EFI
bool "EFI runtime service support"
depends on ACPI
--
2.15.1


2018-06-07 15:48:11

by Yu-cheng Yu

[permalink] [raw]
Subject: [PATCH 4/9] x86/mm: Change _PAGE_DIRTY to _PAGE_DIRTY_HW

We are going to create _PAGE_DIRTY_SW for non-hardware, memory
management purposes. Rename _PAGE_DIRTY to _PAGE_DIRTY_HW and
_PAGE_BIT_DIRTY to _PAGE_BIT_DIRTY_HW to make these PTE dirty
bits more clear. There are no functional changes in this
patch.

Signed-off-by: Yu-cheng Yu <[email protected]>
---
arch/x86/include/asm/pgtable.h | 6 +++---
arch/x86/include/asm/pgtable_types.h | 17 +++++++++--------
arch/x86/kernel/relocate_kernel_64.S | 2 +-
arch/x86/kvm/vmx.c | 2 +-
4 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index f1633de5a675..00b5e79c09a6 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -303,7 +303,7 @@ static inline pte_t pte_mkexec(pte_t pte)

static inline pte_t pte_mkdirty(pte_t pte)
{
- return pte_set_flags(pte, _PAGE_DIRTY | _PAGE_SOFT_DIRTY);
+ return pte_set_flags(pte, _PAGE_DIRTY_HW | _PAGE_SOFT_DIRTY);
}

static inline pte_t pte_mkyoung(pte_t pte)
@@ -377,7 +377,7 @@ static inline pmd_t pmd_wrprotect(pmd_t pmd)

static inline pmd_t pmd_mkdirty(pmd_t pmd)
{
- return pmd_set_flags(pmd, _PAGE_DIRTY | _PAGE_SOFT_DIRTY);
+ return pmd_set_flags(pmd, _PAGE_DIRTY_HW | _PAGE_SOFT_DIRTY);
}

static inline pmd_t pmd_mkdevmap(pmd_t pmd)
@@ -436,7 +436,7 @@ static inline pud_t pud_wrprotect(pud_t pud)

static inline pud_t pud_mkdirty(pud_t pud)
{
- return pud_set_flags(pud, _PAGE_DIRTY | _PAGE_SOFT_DIRTY);
+ return pud_set_flags(pud, _PAGE_DIRTY_HW | _PAGE_SOFT_DIRTY);
}

static inline pud_t pud_mkdevmap(pud_t pud)
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index 1e5a40673953..2ac5d46d7c49 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -15,7 +15,7 @@
#define _PAGE_BIT_PWT 3 /* page write through */
#define _PAGE_BIT_PCD 4 /* page cache disabled */
#define _PAGE_BIT_ACCESSED 5 /* was accessed (raised by CPU) */
-#define _PAGE_BIT_DIRTY 6 /* was written to (raised by CPU) */
+#define _PAGE_BIT_DIRTY_HW 6 /* was written to (raised by CPU) */
#define _PAGE_BIT_PSE 7 /* 4 MB (or 2MB) page */
#define _PAGE_BIT_PAT 7 /* on 4KB pages */
#define _PAGE_BIT_GLOBAL 8 /* Global TLB entry PPro+ */
@@ -45,7 +45,7 @@
#define _PAGE_PWT (_AT(pteval_t, 1) << _PAGE_BIT_PWT)
#define _PAGE_PCD (_AT(pteval_t, 1) << _PAGE_BIT_PCD)
#define _PAGE_ACCESSED (_AT(pteval_t, 1) << _PAGE_BIT_ACCESSED)
-#define _PAGE_DIRTY (_AT(pteval_t, 1) << _PAGE_BIT_DIRTY)
+#define _PAGE_DIRTY_HW (_AT(pteval_t, 1) << _PAGE_BIT_DIRTY_HW)
#define _PAGE_PSE (_AT(pteval_t, 1) << _PAGE_BIT_PSE)
#define _PAGE_GLOBAL (_AT(pteval_t, 1) << _PAGE_BIT_GLOBAL)
#define _PAGE_SOFTW1 (_AT(pteval_t, 1) << _PAGE_BIT_SOFTW1)
@@ -73,7 +73,7 @@
_PAGE_PKEY_BIT3)

#if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE)
-#define _PAGE_KNL_ERRATUM_MASK (_PAGE_DIRTY | _PAGE_ACCESSED)
+#define _PAGE_KNL_ERRATUM_MASK (_PAGE_DIRTY_HW | _PAGE_ACCESSED)
#else
#define _PAGE_KNL_ERRATUM_MASK 0
#endif
@@ -112,9 +112,9 @@
#define _PAGE_PROTNONE (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE)

#define _PAGE_TABLE_NOENC (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER |\
- _PAGE_ACCESSED | _PAGE_DIRTY)
+ _PAGE_ACCESSED | _PAGE_DIRTY_HW)
#define _KERNPG_TABLE_NOENC (_PAGE_PRESENT | _PAGE_RW | \
- _PAGE_ACCESSED | _PAGE_DIRTY)
+ _PAGE_ACCESSED | _PAGE_DIRTY_HW)

/*
* Set of bits not changed in pte_modify. The pte's
@@ -123,7 +123,7 @@
* pte_modify() does modify it.
*/
#define _PAGE_CHG_MASK (PTE_PFN_MASK | _PAGE_PCD | _PAGE_PWT | \
- _PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY | \
+ _PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY_HW | \
_PAGE_SOFT_DIRTY)
#define _HPAGE_CHG_MASK (_PAGE_CHG_MASK | _PAGE_PSE)

@@ -168,7 +168,8 @@ enum page_cache_mode {
_PAGE_ACCESSED)

#define __PAGE_KERNEL_EXEC \
- (_PAGE_PRESENT | _PAGE_RW | _PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_GLOBAL)
+ (_PAGE_PRESENT | _PAGE_RW | _PAGE_DIRTY_HW | _PAGE_ACCESSED | \
+ _PAGE_GLOBAL)
#define __PAGE_KERNEL (__PAGE_KERNEL_EXEC | _PAGE_NX)

#define __PAGE_KERNEL_RO (__PAGE_KERNEL & ~_PAGE_RW)
@@ -187,7 +188,7 @@ enum page_cache_mode {
#define _PAGE_ENC (_AT(pteval_t, sme_me_mask))

#define _KERNPG_TABLE (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | \
- _PAGE_DIRTY | _PAGE_ENC)
+ _PAGE_DIRTY_HW | _PAGE_ENC)
#define _PAGE_TABLE (_KERNPG_TABLE | _PAGE_USER)

#define __PAGE_KERNEL_ENC (__PAGE_KERNEL | _PAGE_ENC)
diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocate_kernel_64.S
index 11eda21eb697..e7665a4767b3 100644
--- a/arch/x86/kernel/relocate_kernel_64.S
+++ b/arch/x86/kernel/relocate_kernel_64.S
@@ -17,7 +17,7 @@
*/

#define PTR(x) (x << 3)
-#define PAGE_ATTR (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY)
+#define PAGE_ATTR (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY_HW)

/*
* control_page + KEXEC_CONTROL_CODE_MAX_SIZE
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 40aa29204baf..52bd01f23316 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5235,7 +5235,7 @@ static int init_rmode_identity_map(struct kvm *kvm)
/* Set up identity-mapping pagetable for EPT in real mode */
for (i = 0; i < PT32_ENT_PER_PAGE; i++) {
tmp = (i << 22) + (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER |
- _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE);
+ _PAGE_ACCESSED | _PAGE_DIRTY_HW | _PAGE_PSE);
r = kvm_write_guest_page(kvm, identity_map_pfn,
&tmp, i * sizeof(tmp), sizeof(tmp));
if (r < 0)
--
2.15.1


2018-06-07 15:48:13

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 1/9] x86/cet: Control protection exception handler

On Thu, Jun 7, 2018 at 7:40 AM Yu-cheng Yu <[email protected]> wrote:
>
> A control protection exception is triggered when a control flow transfer
> attempt violated shadow stack or indirect branch tracking constraints.
> For example, the return address for a RET instruction differs from the
> safe copy on the shadow stack; or a JMP instruction arrives at a non-
> ENDBR instruction.
>
> The control protection exception handler works in a similar way as the
> general protection fault handler.
>
> Signed-off-by: Yu-cheng Yu <[email protected]>
> ---
> arch/x86/entry/entry_32.S | 5 ++++
> arch/x86/entry/entry_64.S | 2 +-
> arch/x86/include/asm/traps.h | 3 +++
> arch/x86/kernel/idt.c | 1 +
> arch/x86/kernel/traps.c | 61 ++++++++++++++++++++++++++++++++++++++++++++
> 5 files changed, 71 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
> index bef8e2b202a8..14b63ef0d7d8 100644
> --- a/arch/x86/entry/entry_32.S
> +++ b/arch/x86/entry/entry_32.S
> @@ -1070,6 +1070,11 @@ ENTRY(general_protection)
> jmp common_exception
> END(general_protection)
>
> +ENTRY(control_protection)
> + pushl $do_control_protection
> + jmp common_exception
> +END(control_protection)

Ugh, you're seriously supporting this on 32-bit? Please test double
fault handling very carefully -- the CET interaction with task
switches is so gross that I didn't even bother reading the spec except
to let the architects know that they were a but nuts to support it at
all.

> +
> #ifdef CONFIG_KVM_GUEST
> ENTRY(async_page_fault)
> ASM_CLAC
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index 3166b9674429..5230f705d229 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -999,7 +999,7 @@ idtentry spurious_interrupt_bug do_spurious_interrupt_bug has_error_code=0
> idtentry coprocessor_error do_coprocessor_error has_error_code=0
> idtentry alignment_check do_alignment_check has_error_code=1
> idtentry simd_coprocessor_error do_simd_coprocessor_error has_error_code=0
> -
> +idtentry control_protection do_control_protection has_error_code=1
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index 03f3d7695dac..4e8769a19aaf 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c

> +/*
> + * When a control protection exception occurs, send a signal
> + * to the responsible application. Currently, control
> + * protection is only enabled for the user mode. This
> + * exception should not come from the kernel mode.
> + */
> +dotraplinkage void
> +do_control_protection(struct pt_regs *regs, long error_code)
> +{
> + struct task_struct *tsk;
> +
> + RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
> + cond_local_irq_enable(regs);
> +
> + tsk = current;
> + if (!cpu_feature_enabled(X86_FEATURE_SHSTK) &&
> + !cpu_feature_enabled(X86_FEATURE_IBT)) {

static_cpu_has(), please. But your handling here is odd -- I think
that we should at least warn if we get #CP with CET disable.

> + goto exit;
> + }
> +
> + if (!user_mode(regs)) {
> + tsk->thread.error_code = error_code;
> + tsk->thread.trap_nr = X86_TRAP_CP;

I realize you copied this from elsewhere in the file, but please
either delete these assignments to error_code and trap_nr or at least
hoist them out of the if block.

> + if (notify_die(DIE_TRAP, "control protection fault", regs,
> + error_code, X86_TRAP_CP, SIGSEGV) != NOTIFY_STOP)

Does this notify_die() check serve any purpose at all? Removing all
the old ones would be a project, but let's try not to add new callers.

> + die("control protection fault", regs, error_code);
> + return;
> + }
> +
> + tsk->thread.error_code = error_code;
> + tsk->thread.trap_nr = X86_TRAP_CP;
> +
> + if (show_unhandled_signals && unhandled_signal(tsk, SIGSEGV) &&
> + printk_ratelimit()) {
> + unsigned int max_idx, err_idx;
> +
> + max_idx = ARRAY_SIZE(control_protection_err) - 1;
> + err_idx = min((unsigned int)error_code - 1, max_idx);

What if error_code == 0? Is that also invalid?

> + pr_info("%s[%d] control protection ip:%lx sp:%lx error:%lx(%s)",
> + tsk->comm, task_pid_nr(tsk),
> + regs->ip, regs->sp, error_code,
> + control_protection_err[err_idx]);
> + print_vma_addr(" in ", regs->ip);
> + pr_cont("\n");
> + }
> +
> +exit:
> + force_sig_info(SIGSEGV, SEND_SIG_PRIV, tsk);

This is definitely wrong for the feature-disabled, !user_mode case.

Also, are you planning on enabling CET for kernel code too?

2018-06-07 16:04:59

by Yu-cheng Yu

[permalink] [raw]
Subject: Re: [PATCH 2/9] x86/cet: Add Kconfig option for user-mode shadow stack

On Thu, 2018-06-07 at 08:47 -0700, Andy Lutomirski wrote:
> On Thu, Jun 7, 2018 at 7:40 AM Yu-cheng Yu <[email protected]> wrote:
> >
> > Introduce Kconfig option X86_INTEL_SHADOW_STACK_USER.
> >
> > An application has shadow stack protection when all the following are
> > true:
> >
> > (1) The kernel has X86_INTEL_SHADOW_STACK_USER enabled,
> > (2) The running processor supports the shadow stack,
> > (3) The application is built with shadow stack enabled tools & libs
> > and, and at runtime, all dependent shared libs can support shadow
> > stack.
> >
> > If this kernel config option is enabled, but (2) or (3) above is not
> > true, the application runs without the shadow stack protection.
> > Existing legacy applications will continue to work without the shadow
> > stack protection.
> >
> > The user-mode shadow stack protection is only implemented for the
> > 64-bit kernel. Thirty-two bit applications are supported under the
> > compatibility mode.
> >
>
> The 64-bit only part seems entirely reasonable. So please make the
> code 64-bit only :)

Yes, I will remove changes in "arch/x86/entry/entry32.S".
We still want to support x32/ia32 in the 64-bit kernel, right?



2018-06-07 18:54:50

by Yu-cheng Yu

[permalink] [raw]
Subject: [PATCH 7/9] x86/mm: Shadow stack page fault error checking

If a page fault is triggered by a shadow stack access (e.g.
call/ret) or shadow stack management instructions (e.g.
wrussq), then bit[6] of the page fault error code is set.

In access_error(), we check if a shadow stack page fault
is within a shadow stack memory area.

Signed-off-by: Yu-cheng Yu <[email protected]>
---
arch/x86/include/asm/traps.h | 2 ++
arch/x86/mm/fault.c | 11 +++++++++++
2 files changed, 13 insertions(+)

diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 5196050ff3d5..58ea2f5722e9 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -157,6 +157,7 @@ enum {
* bit 3 == 1: use of reserved bit detected
* bit 4 == 1: fault was an instruction fetch
* bit 5 == 1: protection keys block access
+ * bit 6 == 1: shadow stack access fault
*/
enum x86_pf_error_code {
X86_PF_PROT = 1 << 0,
@@ -165,5 +166,6 @@ enum x86_pf_error_code {
X86_PF_RSVD = 1 << 3,
X86_PF_INSTR = 1 << 4,
X86_PF_PK = 1 << 5,
+ X86_PF_SHSTK = 1 << 6,
};
#endif /* _ASM_X86_TRAPS_H */
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 73bd8c95ac71..2b3b9170109c 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1166,6 +1166,17 @@ access_error(unsigned long error_code, struct vm_area_struct *vma)
(error_code & X86_PF_INSTR), foreign))
return 1;

+ /*
+ * Verify X86_PF_SHSTK is within a shadow stack VMA.
+ * It is always an error if there is a shadow stack
+ * fault outside a shadow stack VMA.
+ */
+ if (error_code & X86_PF_SHSTK) {
+ if (!(vma->vm_flags & VM_SHSTK))
+ return 1;
+ return 0;
+ }
+
if (error_code & X86_PF_WRITE) {
/* write, present and write, not present: */
if (unlikely(!(vma->vm_flags & VM_WRITE)))
--
2.15.1


2018-06-07 18:54:59

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 2/9] x86/cet: Add Kconfig option for user-mode shadow stack

On Thu, Jun 7, 2018 at 7:40 AM Yu-cheng Yu <[email protected]> wrote:
>
> Introduce Kconfig option X86_INTEL_SHADOW_STACK_USER.
>
> An application has shadow stack protection when all the following are
> true:
>
> (1) The kernel has X86_INTEL_SHADOW_STACK_USER enabled,
> (2) The running processor supports the shadow stack,
> (3) The application is built with shadow stack enabled tools & libs
> and, and at runtime, all dependent shared libs can support shadow
> stack.
>
> If this kernel config option is enabled, but (2) or (3) above is not
> true, the application runs without the shadow stack protection.
> Existing legacy applications will continue to work without the shadow
> stack protection.
>
> The user-mode shadow stack protection is only implemented for the
> 64-bit kernel. Thirty-two bit applications are supported under the
> compatibility mode.
>

The 64-bit only part seems entirely reasonable. So please make the
code 64-bit only :)

2018-06-07 18:55:23

by Yu-cheng Yu

[permalink] [raw]
Subject: [PATCH 1/9] x86/cet: Control protection exception handler

A control protection exception is triggered when a control flow transfer
attempt violated shadow stack or indirect branch tracking constraints.
For example, the return address for a RET instruction differs from the
safe copy on the shadow stack; or a JMP instruction arrives at a non-
ENDBR instruction.

The control protection exception handler works in a similar way as the
general protection fault handler.

Signed-off-by: Yu-cheng Yu <[email protected]>
---
arch/x86/entry/entry_32.S | 5 ++++
arch/x86/entry/entry_64.S | 2 +-
arch/x86/include/asm/traps.h | 3 +++
arch/x86/kernel/idt.c | 1 +
arch/x86/kernel/traps.c | 61 ++++++++++++++++++++++++++++++++++++++++++++
5 files changed, 71 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index bef8e2b202a8..14b63ef0d7d8 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -1070,6 +1070,11 @@ ENTRY(general_protection)
jmp common_exception
END(general_protection)

+ENTRY(control_protection)
+ pushl $do_control_protection
+ jmp common_exception
+END(control_protection)
+
#ifdef CONFIG_KVM_GUEST
ENTRY(async_page_fault)
ASM_CLAC
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 3166b9674429..5230f705d229 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -999,7 +999,7 @@ idtentry spurious_interrupt_bug do_spurious_interrupt_bug has_error_code=0
idtentry coprocessor_error do_coprocessor_error has_error_code=0
idtentry alignment_check do_alignment_check has_error_code=1
idtentry simd_coprocessor_error do_simd_coprocessor_error has_error_code=0
-
+idtentry control_protection do_control_protection has_error_code=1

/*
* Reload gs selector with exception handling
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 3de69330e6c5..5196050ff3d5 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -26,6 +26,7 @@ asmlinkage void invalid_TSS(void);
asmlinkage void segment_not_present(void);
asmlinkage void stack_segment(void);
asmlinkage void general_protection(void);
+asmlinkage void control_protection(void);
asmlinkage void page_fault(void);
asmlinkage void async_page_fault(void);
asmlinkage void spurious_interrupt_bug(void);
@@ -77,6 +78,7 @@ dotraplinkage void do_stack_segment(struct pt_regs *, long);
dotraplinkage void do_double_fault(struct pt_regs *, long);
#endif
dotraplinkage void do_general_protection(struct pt_regs *, long);
+dotraplinkage void do_control_protection(struct pt_regs *, long);
dotraplinkage void do_page_fault(struct pt_regs *, unsigned long);
dotraplinkage void do_spurious_interrupt_bug(struct pt_regs *, long);
dotraplinkage void do_coprocessor_error(struct pt_regs *, long);
@@ -142,6 +144,7 @@ enum {
X86_TRAP_AC, /* 17, Alignment Check */
X86_TRAP_MC, /* 18, Machine Check */
X86_TRAP_XF, /* 19, SIMD Floating-Point Exception */
+ X86_TRAP_CP = 21, /* 21 Control Protection Fault */
X86_TRAP_IRET = 32, /* 32, IRET Exception */
};

diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
index 2c3a1b4294eb..d00493709cec 100644
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -85,6 +85,7 @@ static const __initconst struct idt_data def_idts[] = {
INTG(X86_TRAP_MF, coprocessor_error),
INTG(X86_TRAP_AC, alignment_check),
INTG(X86_TRAP_XF, simd_coprocessor_error),
+ INTG(X86_TRAP_CP, control_protection),

#ifdef CONFIG_X86_32
TSKG(X86_TRAP_DF, GDT_ENTRY_DOUBLEFAULT_TSS),
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 03f3d7695dac..4e8769a19aaf 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -577,6 +577,67 @@ do_general_protection(struct pt_regs *regs, long error_code)
}
NOKPROBE_SYMBOL(do_general_protection);

+static const char *control_protection_err[] =
+{
+ "near-ret",
+ "far-ret/iret",
+ "endbranch",
+ "rstorssp",
+ "setssbsy",
+ "unknown",
+};
+
+/*
+ * When a control protection exception occurs, send a signal
+ * to the responsible application. Currently, control
+ * protection is only enabled for the user mode. This
+ * exception should not come from the kernel mode.
+ */
+dotraplinkage void
+do_control_protection(struct pt_regs *regs, long error_code)
+{
+ struct task_struct *tsk;
+
+ RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
+ cond_local_irq_enable(regs);
+
+ tsk = current;
+ if (!cpu_feature_enabled(X86_FEATURE_SHSTK) &&
+ !cpu_feature_enabled(X86_FEATURE_IBT)) {
+ goto exit;
+ }
+
+ if (!user_mode(regs)) {
+ tsk->thread.error_code = error_code;
+ tsk->thread.trap_nr = X86_TRAP_CP;
+ if (notify_die(DIE_TRAP, "control protection fault", regs,
+ error_code, X86_TRAP_CP, SIGSEGV) != NOTIFY_STOP)
+ die("control protection fault", regs, error_code);
+ return;
+ }
+
+ tsk->thread.error_code = error_code;
+ tsk->thread.trap_nr = X86_TRAP_CP;
+
+ if (show_unhandled_signals && unhandled_signal(tsk, SIGSEGV) &&
+ printk_ratelimit()) {
+ unsigned int max_idx, err_idx;
+
+ max_idx = ARRAY_SIZE(control_protection_err) - 1;
+ err_idx = min((unsigned int)error_code - 1, max_idx);
+ pr_info("%s[%d] control protection ip:%lx sp:%lx error:%lx(%s)",
+ tsk->comm, task_pid_nr(tsk),
+ regs->ip, regs->sp, error_code,
+ control_protection_err[err_idx]);
+ print_vma_addr(" in ", regs->ip);
+ pr_cont("\n");
+ }
+
+exit:
+ force_sig_info(SIGSEGV, SEND_SIG_PRIV, tsk);
+}
+NOKPROBE_SYMBOL(do_control_protection);
+
dotraplinkage void notrace do_int3(struct pt_regs *regs, long error_code)
{
#ifdef CONFIG_DYNAMIC_FTRACE
--
2.15.1


2018-06-07 18:59:06

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 6/9] x86/mm: Introduce ptep_set_wrprotect_flush and related functions

On Thu, Jun 7, 2018 at 7:40 AM Yu-cheng Yu <[email protected]> wrote:
>
> The function ptep_set_wrprotect()/huge_ptep_set_wrprotect() is
> used by copy_page_range()/copy_hugetlb_page_range() to copy
> PTEs.
>
> On x86, when the shadow stack is enabled, only a shadow stack
> PTE has the read-only and _PAGE_DIRTY_HW combination. Upon
> making a dirty PTE read-only, we move its _PAGE_DIRTY_HW to
> _PAGE_DIRTY_SW.
>
> When ptep_set_wrprotect() moves _PAGE_DIRTY_HW to _PAGE_DIRTY_SW,
> if the PTE is writable and the mm is shared, another task could
> race to set _PAGE_DIRTY_HW again.
>
> Introduce ptep_set_wrprotect_flush(), pmdp_set_wrprotect_flush(),
> and huge_ptep_set_wrprotect_flush() to make sure this does not
> happen.
>

This patch adds flushes where they didn't previously exist.

> +static inline void ptep_set_wrprotect_flush(struct vm_area_struct *vma,
> + unsigned long addr, pte_t *ptep)
> +{
> + bool rw;
> +
> + rw = test_and_clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte);
> + if (IS_ENABLED(CONFIG_X86_INTEL_SHADOW_STACK_USER)) {
> + struct mm_struct *mm = vma->vm_mm;
> + pte_t pte;
> +
> + if (rw && (atomic_read(&mm->mm_users) > 1))
> + pte = ptep_clear_flush(vma, addr, ptep);

Why are you clearing the pte?

> -#define __HAVE_ARCH_PMDP_SET_WRPROTECT
> -static inline void pmdp_set_wrprotect(struct mm_struct *mm,
> - unsigned long addr, pmd_t *pmdp)
> +#define __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT_FLUSH
> +static inline void huge_ptep_set_wrprotect_flush(struct vm_area_struct *vma,
> + unsigned long addr, pte_t *ptep)
> {
> - clear_bit(_PAGE_BIT_RW, (unsigned long *)pmdp);
> + ptep_set_wrprotect_flush(vma, addr, ptep);

Maybe I'm just missing something, but you're changed the semantics of
this function significantly.

2018-06-07 19:00:01

by Yu-cheng Yu

[permalink] [raw]
Subject: Re: [PATCH 1/9] x86/cet: Control protection exception handler

On Thu, 2018-06-07 at 08:46 -0700, Andy Lutomirski wrote:
> On Thu, Jun 7, 2018 at 7:40 AM Yu-cheng Yu <[email protected]> wrote:
> >

...

> > diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
> > index bef8e2b202a8..14b63ef0d7d8 100644
> > --- a/arch/x86/entry/entry_32.S
> > +++ b/arch/x86/entry/entry_32.S
> > @@ -1070,6 +1070,11 @@ ENTRY(general_protection)
> > jmp common_exception
> > END(general_protection)
> >
> > +ENTRY(control_protection)
> > + pushl $do_control_protection
> > + jmp common_exception
> > +END(control_protection)
>
> Ugh, you're seriously supporting this on 32-bit? Please test double
> fault handling very carefully -- the CET interaction with task
> switches is so gross that I didn't even bother reading the spec except
> to let the architects know that they were a but nuts to support it at
> all.
>

I will remove this.

...

> > diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> > index 03f3d7695dac..4e8769a19aaf 100644
> > --- a/arch/x86/kernel/traps.c
> > +++ b/arch/x86/kernel/traps.c
>
> > +/*
> > + * When a control protection exception occurs, send a signal
> > + * to the responsible application. Currently, control
> > + * protection is only enabled for the user mode. This
> > + * exception should not come from the kernel mode.
> > + */
> > +dotraplinkage void
> > +do_control_protection(struct pt_regs *regs, long error_code)
> > +{
> > + struct task_struct *tsk;
> > +
> > + RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
> > + cond_local_irq_enable(regs);
> > +
> > + tsk = current;
> > + if (!cpu_feature_enabled(X86_FEATURE_SHSTK) &&
> > + !cpu_feature_enabled(X86_FEATURE_IBT)) {
>
> static_cpu_has(), please. But your handling here is odd -- I think
> that we should at least warn if we get #CP with CET disable.

I will fix it.

>
> > + goto exit;
> > + }
> > +
> > + if (!user_mode(regs)) {
> > + tsk->thread.error_code = error_code;
> > + tsk->thread.trap_nr = X86_TRAP_CP;
>
> I realize you copied this from elsewhere in the file, but please
> either delete these assignments to error_code and trap_nr or at least
> hoist them out of the if block.

I will fix it.

>
> > + if (notify_die(DIE_TRAP, "control protection fault", regs,
> > + error_code, X86_TRAP_CP, SIGSEGV) != NOTIFY_STOP)
>
> Does this notify_die() check serve any purpose at all? Removing all
> the old ones would be a project, but let's try not to add new callers.

OK.

>
> > + die("control protection fault", regs, error_code);
> > + return;
> > + }
> > +
> > + tsk->thread.error_code = error_code;
> > + tsk->thread.trap_nr = X86_TRAP_CP;
> > +
> > + if (show_unhandled_signals && unhandled_signal(tsk, SIGSEGV) &&
> > + printk_ratelimit()) {
> > + unsigned int max_idx, err_idx;
> > +
> > + max_idx = ARRAY_SIZE(control_protection_err) - 1;
> > + err_idx = min((unsigned int)error_code - 1, max_idx);
>
> What if error_code == 0? Is that also invalid?

The error code is between 1 and 5 inclusive. I thought if it is 0, then
err_idx would become max_idx here. I can change it to:

if (error_code == 0)
error_code = max_idx;

Or, add some comments for this case.

>
> > + pr_info("%s[%d] control protection ip:%lx sp:%lx error:%lx(%s)",
> > + tsk->comm, task_pid_nr(tsk),
> > + regs->ip, regs->sp, error_code,
> > + control_protection_err[err_idx]);
> > + print_vma_addr(" in ", regs->ip);
> > + pr_cont("\n");
> > + }
> > +
> > +exit:
> > + force_sig_info(SIGSEGV, SEND_SIG_PRIV, tsk);
>
> This is definitely wrong for the feature-disabled, !user_mode case.
>

I will fix it.

> Also, are you planning on enabling CET for kernel code too?

Yes, kernel protection will be enabled later.



2018-06-07 19:00:04

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 2/9] x86/cet: Add Kconfig option for user-mode shadow stack

On Thu, Jun 7, 2018 at 9:02 AM Yu-cheng Yu <[email protected]> wrote:
>
> On Thu, 2018-06-07 at 08:47 -0700, Andy Lutomirski wrote:
> > On Thu, Jun 7, 2018 at 7:40 AM Yu-cheng Yu <[email protected]> wrote:
> > >
> > > Introduce Kconfig option X86_INTEL_SHADOW_STACK_USER.
> > >
> > > An application has shadow stack protection when all the following are
> > > true:
> > >
> > > (1) The kernel has X86_INTEL_SHADOW_STACK_USER enabled,
> > > (2) The running processor supports the shadow stack,
> > > (3) The application is built with shadow stack enabled tools & libs
> > > and, and at runtime, all dependent shared libs can support shadow
> > > stack.
> > >
> > > If this kernel config option is enabled, but (2) or (3) above is not
> > > true, the application runs without the shadow stack protection.
> > > Existing legacy applications will continue to work without the shadow
> > > stack protection.
> > >
> > > The user-mode shadow stack protection is only implemented for the
> > > 64-bit kernel. Thirty-two bit applications are supported under the
> > > compatibility mode.
> > >
> >
> > The 64-bit only part seems entirely reasonable. So please make the
> > code 64-bit only :)
>
> Yes, I will remove changes in "arch/x86/entry/entry32.S".
> We still want to support x32/ia32 in the 64-bit kernel, right?
>

Yes, I think. But that's not in entry_32.S


>

2018-06-07 19:00:20

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 7/9] x86/mm: Shadow stack page fault error checking

On Thu, Jun 7, 2018 at 7:40 AM Yu-cheng Yu <[email protected]> wrote:
>
> If a page fault is triggered by a shadow stack access (e.g.
> call/ret) or shadow stack management instructions (e.g.
> wrussq), then bit[6] of the page fault error code is set.
>
> In access_error(), we check if a shadow stack page fault
> is within a shadow stack memory area.
>
> Signed-off-by: Yu-cheng Yu <[email protected]>

> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index 73bd8c95ac71..2b3b9170109c 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -1166,6 +1166,17 @@ access_error(unsigned long error_code, struct vm_area_struct *vma)
> (error_code & X86_PF_INSTR), foreign))
> return 1;
>
> + /*
> + * Verify X86_PF_SHSTK is within a shadow stack VMA.
> + * It is always an error if there is a shadow stack
> + * fault outside a shadow stack VMA.
> + */
> + if (error_code & X86_PF_SHSTK) {
> + if (!(vma->vm_flags & VM_SHSTK))
> + return 1;
> + return 0;
> + }
> +

What, if anything, would go wrong without this change? It seems like
it might be purely an optimization. If so, can you mention that in
the comment?

2018-06-07 19:01:45

by Yu-cheng Yu

[permalink] [raw]
Subject: Re: [PATCH 7/9] x86/mm: Shadow stack page fault error checking

On Thu, 2018-06-07 at 09:26 -0700, Andy Lutomirski wrote:
> On Thu, Jun 7, 2018 at 7:40 AM Yu-cheng Yu <[email protected]> wrote:
> >
> > If a page fault is triggered by a shadow stack access (e.g.
> > call/ret) or shadow stack management instructions (e.g.
> > wrussq), then bit[6] of the page fault error code is set.
> >
> > In access_error(), we check if a shadow stack page fault
> > is within a shadow stack memory area.
> >
> > Signed-off-by: Yu-cheng Yu <[email protected]>
>
> > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> > index 73bd8c95ac71..2b3b9170109c 100644
> > --- a/arch/x86/mm/fault.c
> > +++ b/arch/x86/mm/fault.c
> > @@ -1166,6 +1166,17 @@ access_error(unsigned long error_code, struct vm_area_struct *vma)
> > (error_code & X86_PF_INSTR), foreign))
> > return 1;
> >
> > + /*
> > + * Verify X86_PF_SHSTK is within a shadow stack VMA.
> > + * It is always an error if there is a shadow stack
> > + * fault outside a shadow stack VMA.
> > + */
> > + if (error_code & X86_PF_SHSTK) {
> > + if (!(vma->vm_flags & VM_SHSTK))
> > + return 1;
> > + return 0;
> > + }
> > +
>
> What, if anything, would go wrong without this change? It seems like
> it might be purely an optimization. If so, can you mention that in
> the comment?

Without this check, the page fault code could overlook the fact that the
application is trying to use non shadow stack area for shadow stack.
I will add this to the comments.


2018-06-07 19:04:08

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH 7/9] x86/mm: Shadow stack page fault error checking

On 06/07/2018 09:26 AM, Andy Lutomirski wrote:
>>
>> + /*
>> + * Verify X86_PF_SHSTK is within a shadow stack VMA.
>> + * It is always an error if there is a shadow stack
>> + * fault outside a shadow stack VMA.
>> + */
>> + if (error_code & X86_PF_SHSTK) {
>> + if (!(vma->vm_flags & VM_SHSTK))
>> + return 1;
>> + return 0;
>> + }
>> +
> What, if anything, would go wrong without this change? It seems like
> it might be purely an optimization. If so, can you mention that in
> the comment?

This is a fine exercise. I'm curious what it does, too.

But, I really like it being explicit in the end. If we depend on
implicit behavior, I really worry that someone breaks it accidentally.

2018-06-07 19:09:42

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH 6/9] x86/mm: Introduce ptep_set_wrprotect_flush and related functions

On 06/07/2018 09:24 AM, Andy Lutomirski wrote:
>> +static inline void ptep_set_wrprotect_flush(struct vm_area_struct *vma,
>> + unsigned long addr, pte_t *ptep)
>> +{
>> + bool rw;
>> +
>> + rw = test_and_clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte);
>> + if (IS_ENABLED(CONFIG_X86_INTEL_SHADOW_STACK_USER)) {
>> + struct mm_struct *mm = vma->vm_mm;
>> + pte_t pte;
>> +
>> + if (rw && (atomic_read(&mm->mm_users) > 1))
>> + pte = ptep_clear_flush(vma, addr, ptep);
> Why are you clearing the pte?

I think I insisted on this being in there.

First of all, we need to flush the TLB eventually because we need the
shadowstack PTE permissions to be in effect.

But, generally, we can't clear a dirty bit in a "live" PTE without
flushing. The processor can keep writing until we flush, and even keep
setting it whenever _it_ allows a write, which it can do based on stale
TLB contents. Practically, I think a walk to set the dirty bit is
mostly the same as a TLB miss, but that's certainly not guaranteed forever.

That's even ignoring all the fun errata we have.

2018-06-07 19:10:56

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 6/9] x86/mm: Introduce ptep_set_wrprotect_flush and related functions

On Thu, Jun 7, 2018 at 11:23 AM Dave Hansen <[email protected]> wrote:
>
> On 06/07/2018 09:24 AM, Andy Lutomirski wrote:
> >> +static inline void ptep_set_wrprotect_flush(struct vm_area_struct *vma,
> >> + unsigned long addr, pte_t *ptep)
> >> +{
> >> + bool rw;
> >> +
> >> + rw = test_and_clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte);
> >> + if (IS_ENABLED(CONFIG_X86_INTEL_SHADOW_STACK_USER)) {
> >> + struct mm_struct *mm = vma->vm_mm;
> >> + pte_t pte;
> >> +
> >> + if (rw && (atomic_read(&mm->mm_users) > 1))
> >> + pte = ptep_clear_flush(vma, addr, ptep);
> > Why are you clearing the pte?
>
> I think I insisted on this being in there.
>
> First of all, we need to flush the TLB eventually because we need the
> shadowstack PTE permissions to be in effect.
>
> But, generally, we can't clear a dirty bit in a "live" PTE without
> flushing. The processor can keep writing until we flush, and even keep
> setting it whenever _it_ allows a write, which it can do based on stale
> TLB contents. Practically, I think a walk to set the dirty bit is
> mostly the same as a TLB miss, but that's certainly not guaranteed forever.
>
> That's even ignoring all the fun errata we have.

Maybe make it a separate patch, then? It seems like this patch is
doing two almost entirely separate things: adding some flushes and
adding the magic hwdirty -> swdirty transition. As it stands, it's
quite difficult to read the patch and figure out what it does.

2018-06-07 20:31:25

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH 6/9] x86/mm: Introduce ptep_set_wrprotect_flush and related functions

On 06/07/2018 09:24 AM, Andy Lutomirski wrote:

>> +static inline void ptep_set_wrprotect_flush(struct vm_area_struct *vma,
>> + unsigned long addr, pte_t *ptep)
>> +{
>> + bool rw;
>> +
>> + rw = test_and_clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte);
>> + if (IS_ENABLED(CONFIG_X86_INTEL_SHADOW_STACK_USER)) {
>> + struct mm_struct *mm = vma->vm_mm;
>> + pte_t pte;
>> +
>> + if (rw && (atomic_read(&mm->mm_users) > 1))
>> + pte = ptep_clear_flush(vma, addr, ptep);
> Why are you clearing the pte?

I found my notes on the subject. :)

Here's the sequence that causes the problem. This could happen any time
we try to take a PTE from read-write to read-only. P==Present, W=Write,
D=Dirty:

CPU0 does a write, sees PTE with P=1,W=1,D=0
CPU0 decides to set D=1
CPU1 comes in and sets W=0
CPU0 does locked operation to set D=1
CPU0 sees P=1,W=0,D=0
CPU0 sets back P=1,W=0,D=1
CPU0 loads P=1,W=0,D=1 into the TLB
CPU0 attempts to continue the write, but sees W=0 in the TLB and a #PF
is generated because of the write fault.

The problem with this is that we end up with a shadowstack-PTE
(Write=0,Dirty=1) where we didn't want one. This, unfortunately,
imposes extra TLB flushing overhead on the R/W->R/O transitions that
does not exist before shadowstack enabling.

Yu-cheng, could you please add this to the patch description?

2018-06-07 20:39:57

by Yu-cheng Yu

[permalink] [raw]
Subject: Re: [PATCH 6/9] x86/mm: Introduce ptep_set_wrprotect_flush and related functions

On Thu, 2018-06-07 at 13:29 -0700, Dave Hansen wrote:
> On 06/07/2018 09:24 AM, Andy Lutomirski wrote:
>
> >> +static inline void ptep_set_wrprotect_flush(struct vm_area_struct *vma,
> >> + unsigned long addr, pte_t *ptep)
> >> +{
> >> + bool rw;
> >> +
> >> + rw = test_and_clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte);
> >> + if (IS_ENABLED(CONFIG_X86_INTEL_SHADOW_STACK_USER)) {
> >> + struct mm_struct *mm = vma->vm_mm;
> >> + pte_t pte;
> >> +
> >> + if (rw && (atomic_read(&mm->mm_users) > 1))
> >> + pte = ptep_clear_flush(vma, addr, ptep);
> > Why are you clearing the pte?
>
> I found my notes on the subject. :)
>
> Here's the sequence that causes the problem. This could happen any time
> we try to take a PTE from read-write to read-only. P==Present, W=Write,
> D=Dirty:
>
> CPU0 does a write, sees PTE with P=1,W=1,D=0
> CPU0 decides to set D=1
> CPU1 comes in and sets W=0
> CPU0 does locked operation to set D=1
> CPU0 sees P=1,W=0,D=0
> CPU0 sets back P=1,W=0,D=1
> CPU0 loads P=1,W=0,D=1 into the TLB
> CPU0 attempts to continue the write, but sees W=0 in the TLB and a #PF
> is generated because of the write fault.
>
> The problem with this is that we end up with a shadowstack-PTE
> (Write=0,Dirty=1) where we didn't want one. This, unfortunately,
> imposes extra TLB flushing overhead on the R/W->R/O transitions that
> does not exist before shadowstack enabling.
>
> Yu-cheng, could you please add this to the patch description?

I will add that.


2018-06-08 01:00:39

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 6/9] x86/mm: Introduce ptep_set_wrprotect_flush and related functions

On Thu, Jun 7, 2018 at 1:30 PM Dave Hansen <[email protected]> wrote:
>
> On 06/07/2018 09:24 AM, Andy Lutomirski wrote:
>
> >> +static inline void ptep_set_wrprotect_flush(struct vm_area_struct *vma,
> >> + unsigned long addr, pte_t *ptep)
> >> +{
> >> + bool rw;
> >> +
> >> + rw = test_and_clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte);
> >> + if (IS_ENABLED(CONFIG_X86_INTEL_SHADOW_STACK_USER)) {
> >> + struct mm_struct *mm = vma->vm_mm;
> >> + pte_t pte;
> >> +
> >> + if (rw && (atomic_read(&mm->mm_users) > 1))
> >> + pte = ptep_clear_flush(vma, addr, ptep);
> > Why are you clearing the pte?
>
> I found my notes on the subject. :)
>
> Here's the sequence that causes the problem. This could happen any time
> we try to take a PTE from read-write to read-only. P==Present, W=Write,
> D=Dirty:
>
> CPU0 does a write, sees PTE with P=1,W=1,D=0
> CPU0 decides to set D=1
> CPU1 comes in and sets W=0
> CPU0 does locked operation to set D=1
> CPU0 sees P=1,W=0,D=0
> CPU0 sets back P=1,W=0,D=1
> CPU0 loads P=1,W=0,D=1 into the TLB
> CPU0 attempts to continue the write, but sees W=0 in the TLB and a #PF
> is generated because of the write fault.
>
> The problem with this is that we end up with a shadowstack-PTE
> (Write=0,Dirty=1) where we didn't want one. This, unfortunately,
> imposes extra TLB flushing overhead on the R/W->R/O transitions that
> does not exist before shadowstack enabling.
>

So what exactly do the architects want the OS to do? AFAICS the only
valid ways to clear the dirty bit are:

--- Choice 1 ---
a) Set P=0.
b) Flush using an IPI
c) Read D (so we know if the page was actually dirty)
d) Set P=1,W=0,D=0

and we need to handle spurious faults that happen between steps (a)
and (c). This isn't so easy because the straightforward "is the fault
spurious" check is going to think it's *not* spurious.

--- Choice 2 ---
a) Set W=0
b) flush
c) Test and clear D

and we need to handle the spurious fault between b and c. At least
this particular spurious fault is easier to handle since we can check
the error code.

But surely the right solution is to get the architecture team to see
if they can fix the dirty-bit-setting algorithm or, even better, to
look and see if the dirty-bit-setting algorithm is *already* better
and just document it. If the cpu does a locked set-bit on D in your
example, the CPU is just being silly. The CPU should make the whole
operation fully atomic: when trying to write to a page that's D=0 in
the TLB, it should re-walk the page tables and, atomically, load the
PTE and, if it's W=1,D=0, set D=1. I'd honestly be a bit surprised if
modern CPUs don't already do this.

(Hmm. If the CPUs were that smart, then we wouldn't need a flush at
all in some cases. If we lock cmpxchg to change W=1,D=0 to W=0,D=0,
then we know that no other CPU can subsequently write the page without
re-walking, and we don't need to flush.)

Can you ask the architecture folks to clarify the situation? And, if
your notes are indeed correct, don't we need code to handle spurious
faults?

--Andy

2018-06-08 01:21:20

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH 6/9] x86/mm: Introduce ptep_set_wrprotect_flush and related functions

On 06/07/2018 05:59 PM, Andy Lutomirski wrote:
> Can you ask the architecture folks to clarify the situation? And, if
> your notes are indeed correct, don't we need code to handle spurious
> faults?

I'll double check that I didn't misunderstand the situation and that it
has not changed on processors with shadow stacks.

But, as far as spurious faults, wouldn't it just be a fault because
we've transiently gone to Present=0? We already do that when clearing
the Dirty bit, so I'm not sure that's new. We surely already handle
that one.

2018-06-08 03:55:30

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 4/9] x86/mm: Change _PAGE_DIRTY to _PAGE_DIRTY_HW

Hi Yu-cheng,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on asm-generic/master]
[also build test ERROR on v4.17 next-20180607]
[cannot apply to tip/x86/core mmotm/master]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/Yu-cheng-Yu/Control-Flow-Enforcement-Part-2/20180608-111152
base: https://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic.git master
config: i386-tinyconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386

Note: the linux-review/Yu-cheng-Yu/Control-Flow-Enforcement-Part-2/20180608-111152 HEAD 71d9d315a5e241d9b500540a452d0bec292e1dbb builds fine.
It only hurts bisectibility.

All errors (new ones prefixed by >>):

In file included from include/linux/memremap.h:8:0,
from include/linux/mm.h:27,
from include/linux/memcontrol.h:29,
from include/linux/swap.h:9,
from include/linux/suspend.h:5,
from arch/x86/kernel/asm-offsets.c:13:
arch/x86/include/asm/pgtable.h: In function 'pte_dirty':
>> arch/x86/include/asm/pgtable.h:121:26: error: '_PAGE_DIRTY' undeclared (first use in this function); did you mean '_PAGE_DIRTY_HW'?
return pte_flags(pte) & _PAGE_DIRTY;
^~~~~~~~~~~
_PAGE_DIRTY_HW
arch/x86/include/asm/pgtable.h:121:26: note: each undeclared identifier is reported only once for each function it appears in
arch/x86/include/asm/pgtable.h: In function 'pmd_dirty':
arch/x86/include/asm/pgtable.h:145:26: error: '_PAGE_DIRTY' undeclared (first use in this function); did you mean '_PAGE_DIRTY_HW'?
return pmd_flags(pmd) & _PAGE_DIRTY;
^~~~~~~~~~~
_PAGE_DIRTY_HW
arch/x86/include/asm/pgtable.h: In function 'pud_dirty':
arch/x86/include/asm/pgtable.h:155:26: error: '_PAGE_DIRTY' undeclared (first use in this function); did you mean '_PAGE_DIRTY_HW'?
return pud_flags(pud) & _PAGE_DIRTY;
^~~~~~~~~~~
_PAGE_DIRTY_HW
arch/x86/include/asm/pgtable.h: In function 'pte_mkclean':
arch/x86/include/asm/pgtable.h:286:30: error: '_PAGE_DIRTY' undeclared (first use in this function); did you mean '_PAGE_DIRTY_HW'?
return pte_clear_flags(pte, _PAGE_DIRTY);
^~~~~~~~~~~
_PAGE_DIRTY_HW
arch/x86/include/asm/pgtable.h: In function 'pmd_mkclean':
arch/x86/include/asm/pgtable.h:370:30: error: '_PAGE_DIRTY' undeclared (first use in this function); did you mean '_PAGE_DIRTY_HW'?
return pmd_clear_flags(pmd, _PAGE_DIRTY);
^~~~~~~~~~~
_PAGE_DIRTY_HW
arch/x86/include/asm/pgtable.h: In function 'pud_mkclean':
arch/x86/include/asm/pgtable.h:429:30: error: '_PAGE_DIRTY' undeclared (first use in this function); did you mean '_PAGE_DIRTY_HW'?
return pud_clear_flags(pud, _PAGE_DIRTY);
^~~~~~~~~~~
_PAGE_DIRTY_HW
make[2]: *** [arch/x86/kernel/asm-offsets.s] Error 1
make[2]: Target '__build' not remade because of errors.
make[1]: *** [prepare0] Error 2
make[1]: Target 'prepare' not remade because of errors.
make: *** [sub-make] Error 2

vim +121 arch/x86/include/asm/pgtable.h

54321d947 arch/x86/include/asm/pgtable.h Jeremy Fitzhardinge 2009-02-11 114
8405b122a include/asm-x86/pgtable.h Jeremy Fitzhardinge 2008-01-30 115 /*
4614139c6 include/asm-x86/pgtable.h Jeremy Fitzhardinge 2008-01-30 116 * The following only work if pte_present() is true.
4614139c6 include/asm-x86/pgtable.h Jeremy Fitzhardinge 2008-01-30 117 * Undefined behaviour if not..
4614139c6 include/asm-x86/pgtable.h Jeremy Fitzhardinge 2008-01-30 118 */
3cbaeafeb include/asm-x86/pgtable.h Joe Perches 2008-03-23 119 static inline int pte_dirty(pte_t pte)
3cbaeafeb include/asm-x86/pgtable.h Joe Perches 2008-03-23 120 {
a15af1c9e include/asm-x86/pgtable.h Jeremy Fitzhardinge 2008-05-26 @121 return pte_flags(pte) & _PAGE_DIRTY;
3cbaeafeb include/asm-x86/pgtable.h Joe Perches 2008-03-23 122 }
3cbaeafeb include/asm-x86/pgtable.h Joe Perches 2008-03-23 123

:::::: The code at line 121 was first introduced by commit
:::::: a15af1c9ea2750a9ff01e51615c45950bad8221b x86/paravirt: add pte_flags to just get pte flags

:::::: TO: Jeremy Fitzhardinge <[email protected]>
:::::: CC: Thomas Gleixner <[email protected]>

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (4.94 kB)
.config.gz (6.57 kB)
Download all attachments

2018-06-08 04:18:47

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 1/9] x86/cet: Control protection exception handler

Hi Yu-cheng,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on asm-generic/master]
[also build test ERROR on v4.17 next-20180607]
[cannot apply to tip/x86/core mmotm/master]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/Yu-cheng-Yu/Control-Flow-Enforcement-Part-2/20180608-111152
base: https://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic.git master
config: i386-randconfig-a0-201822 (attached as .config)
compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4
reproduce:
# save the attached .config to linux build tree
make ARCH=i386

All errors (new ones prefixed by >>):

In file included from arch/x86/include/asm/thread_info.h:53:0,
from include/linux/thread_info.h:38,
from arch/x86/include/asm/preempt.h:7,
from include/linux/preempt.h:81,
from include/linux/rcupdate.h:40,
from include/linux/rculist.h:11,
from include/linux/pid.h:5,
from include/linux/sched.h:14,
from include/linux/context_tracking.h:5,
from arch/x86/kernel/traps.c:15:
arch/x86/kernel/traps.c: In function 'do_control_protection':
>> arch/x86/kernel/traps.c:605:27: error: 'X86_FEATURE_SHSTK' undeclared (first use in this function)
if (!cpu_feature_enabled(X86_FEATURE_SHSTK) &&
^
arch/x86/include/asm/cpufeature.h:127:24: note: in definition of macro 'cpu_feature_enabled'
(__builtin_constant_p(bit) && DISABLED_MASK_BIT_SET(bit) ? 0 : static_cpu_has(bit))
^
arch/x86/kernel/traps.c:605:27: note: each undeclared identifier is reported only once for each function it appears in
if (!cpu_feature_enabled(X86_FEATURE_SHSTK) &&
^
arch/x86/include/asm/cpufeature.h:127:24: note: in definition of macro 'cpu_feature_enabled'
(__builtin_constant_p(bit) && DISABLED_MASK_BIT_SET(bit) ? 0 : static_cpu_has(bit))
^
>> arch/x86/kernel/traps.c:606:27: error: 'X86_FEATURE_IBT' undeclared (first use in this function)
!cpu_feature_enabled(X86_FEATURE_IBT)) {
^
arch/x86/include/asm/cpufeature.h:127:24: note: in definition of macro 'cpu_feature_enabled'
(__builtin_constant_p(bit) && DISABLED_MASK_BIT_SET(bit) ? 0 : static_cpu_has(bit))
^

vim +/X86_FEATURE_SHSTK +605 arch/x86/kernel/traps.c

589
590 /*
591 * When a control protection exception occurs, send a signal
592 * to the responsible application. Currently, control
593 * protection is only enabled for the user mode. This
594 * exception should not come from the kernel mode.
595 */
596 dotraplinkage void
597 do_control_protection(struct pt_regs *regs, long error_code)
598 {
599 struct task_struct *tsk;
600
601 RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
602 cond_local_irq_enable(regs);
603
604 tsk = current;
> 605 if (!cpu_feature_enabled(X86_FEATURE_SHSTK) &&
> 606 !cpu_feature_enabled(X86_FEATURE_IBT)) {
607 goto exit;
608 }
609
610 if (!user_mode(regs)) {
611 tsk->thread.error_code = error_code;
612 tsk->thread.trap_nr = X86_TRAP_CP;
613 if (notify_die(DIE_TRAP, "control protection fault", regs,
614 error_code, X86_TRAP_CP, SIGSEGV) != NOTIFY_STOP)
615 die("control protection fault", regs, error_code);
616 return;
617 }
618
619 tsk->thread.error_code = error_code;
620 tsk->thread.trap_nr = X86_TRAP_CP;
621
622 if (show_unhandled_signals && unhandled_signal(tsk, SIGSEGV) &&
623 printk_ratelimit()) {
624 unsigned int max_idx, err_idx;
625
626 max_idx = ARRAY_SIZE(control_protection_err) - 1;
627 err_idx = min((unsigned int)error_code - 1, max_idx);
628 pr_info("%s[%d] control protection ip:%lx sp:%lx error:%lx(%s)",
629 tsk->comm, task_pid_nr(tsk),
630 regs->ip, regs->sp, error_code,
631 control_protection_err[err_idx]);
632 print_vma_addr(" in ", regs->ip);
633 pr_cont("\n");
634 }
635
636 exit:
637 force_sig_info(SIGSEGV, SEND_SIG_PRIV, tsk);
638 }
639 NOKPROBE_SYMBOL(do_control_protection);
640

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (4.62 kB)
.config.gz (28.58 kB)
Download all attachments

2018-06-08 04:20:02

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 1/9] x86/cet: Control protection exception handler

Hi Yu-cheng,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on asm-generic/master]
[also build test ERROR on v4.17 next-20180607]
[cannot apply to tip/x86/core mmotm/master]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/Yu-cheng-Yu/Control-Flow-Enforcement-Part-2/20180608-111152
base: https://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic.git master
config: i386-tinyconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386

All errors (new ones prefixed by >>):

In file included from arch/x86/include/asm/thread_info.h:53:0,
from include/linux/thread_info.h:38,
from arch/x86/include/asm/preempt.h:7,
from include/linux/preempt.h:81,
from include/linux/rcupdate.h:40,
from include/linux/rculist.h:11,
from include/linux/pid.h:5,
from include/linux/sched.h:14,
from include/linux/context_tracking.h:5,
from arch/x86/kernel/traps.c:15:
arch/x86/kernel/traps.c: In function 'do_control_protection':
>> arch/x86/kernel/traps.c:605:27: error: 'X86_FEATURE_SHSTK' undeclared (first use in this function); did you mean 'X86_FEATURE_EST'?
if (!cpu_feature_enabled(X86_FEATURE_SHSTK) &&
^
arch/x86/include/asm/cpufeature.h:127:24: note: in definition of macro 'cpu_feature_enabled'
(__builtin_constant_p(bit) && DISABLED_MASK_BIT_SET(bit) ? 0 : static_cpu_has(bit))
^~~
arch/x86/kernel/traps.c:605:27: note: each undeclared identifier is reported only once for each function it appears in
if (!cpu_feature_enabled(X86_FEATURE_SHSTK) &&
^
arch/x86/include/asm/cpufeature.h:127:24: note: in definition of macro 'cpu_feature_enabled'
(__builtin_constant_p(bit) && DISABLED_MASK_BIT_SET(bit) ? 0 : static_cpu_has(bit))
^~~
>> arch/x86/kernel/traps.c:606:27: error: 'X86_FEATURE_IBT' undeclared (first use in this function); did you mean 'X86_FEATURE_IBS'?
!cpu_feature_enabled(X86_FEATURE_IBT)) {
^
arch/x86/include/asm/cpufeature.h:127:24: note: in definition of macro 'cpu_feature_enabled'
(__builtin_constant_p(bit) && DISABLED_MASK_BIT_SET(bit) ? 0 : static_cpu_has(bit))
^~~

vim +605 arch/x86/kernel/traps.c

589
590 /*
591 * When a control protection exception occurs, send a signal
592 * to the responsible application. Currently, control
593 * protection is only enabled for the user mode. This
594 * exception should not come from the kernel mode.
595 */
596 dotraplinkage void
597 do_control_protection(struct pt_regs *regs, long error_code)
598 {
599 struct task_struct *tsk;
600
601 RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
602 cond_local_irq_enable(regs);
603
604 tsk = current;
> 605 if (!cpu_feature_enabled(X86_FEATURE_SHSTK) &&
> 606 !cpu_feature_enabled(X86_FEATURE_IBT)) {
607 goto exit;
608 }
609
610 if (!user_mode(regs)) {
611 tsk->thread.error_code = error_code;
612 tsk->thread.trap_nr = X86_TRAP_CP;
613 if (notify_die(DIE_TRAP, "control protection fault", regs,
614 error_code, X86_TRAP_CP, SIGSEGV) != NOTIFY_STOP)
615 die("control protection fault", regs, error_code);
616 return;
617 }
618
619 tsk->thread.error_code = error_code;
620 tsk->thread.trap_nr = X86_TRAP_CP;
621
622 if (show_unhandled_signals && unhandled_signal(tsk, SIGSEGV) &&
623 printk_ratelimit()) {
624 unsigned int max_idx, err_idx;
625
626 max_idx = ARRAY_SIZE(control_protection_err) - 1;
627 err_idx = min((unsigned int)error_code - 1, max_idx);
628 pr_info("%s[%d] control protection ip:%lx sp:%lx error:%lx(%s)",
629 tsk->comm, task_pid_nr(tsk),
630 regs->ip, regs->sp, error_code,
631 control_protection_err[err_idx]);
632 print_vma_addr(" in ", regs->ip);
633 pr_cont("\n");
634 }
635
636 exit:
637 force_sig_info(SIGSEGV, SEND_SIG_PRIV, tsk);
638 }
639 NOKPROBE_SYMBOL(do_control_protection);
640

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (4.66 kB)
.config.gz (6.56 kB)
Download all attachments

2018-06-08 04:44:51

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 6/9] x86/mm: Introduce ptep_set_wrprotect_flush and related functions

Hi Yu-cheng,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on asm-generic/master]
[also build test ERROR on v4.17 next-20180607]
[cannot apply to tip/x86/core mmotm/master]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/Yu-cheng-Yu/Control-Flow-Enforcement-Part-2/20180608-111152
base: https://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic.git master
config: sparc64-allmodconfig (attached as .config)
compiler: sparc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=sparc64

All error/warnings (new ones prefixed by >>):

In file included from arch/sparc/include/asm/pgtable_64.h:1064:0,
from arch/sparc/include/asm/pgtable.h:5,
from include/linux/memremap.h:8,
from include/linux/mm.h:27,
from mm//swap.c:16:
include/asm-generic/pgtable.h: In function 'huge_ptep_set_wrorptect_flush':
>> include/asm-generic/pgtable.h:129:2: error: implicit declaration of function 'huge_ptep_set_wrprotect'; did you mean 'huge_ptep_set_wrorptect_flush'? [-Werror=implicit-function-declaration]
huge_ptep_set_wrprotect(vma->vm_mm, addr, ptep);
^~~~~~~~~~~~~~~~~~~~~~~
huge_ptep_set_wrorptect_flush
In file included from include/linux/hugetlb.h:445:0,
from mm//swap.c:35:
arch/sparc/include/asm/hugetlb.h: At top level:
>> arch/sparc/include/asm/hugetlb.h:59:20: warning: conflicting types for 'huge_ptep_set_wrprotect'
static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
^~~~~~~~~~~~~~~~~~~~~~~
>> arch/sparc/include/asm/hugetlb.h:59:20: error: static declaration of 'huge_ptep_set_wrprotect' follows non-static declaration
In file included from arch/sparc/include/asm/pgtable_64.h:1064:0,
from arch/sparc/include/asm/pgtable.h:5,
from include/linux/memremap.h:8,
from include/linux/mm.h:27,
from mm//swap.c:16:
include/asm-generic/pgtable.h:129:2: note: previous implicit declaration of 'huge_ptep_set_wrprotect' was here
huge_ptep_set_wrprotect(vma->vm_mm, addr, ptep);
^~~~~~~~~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors

vim +129 include/asm-generic/pgtable.h

123
124 #ifndef __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT_FLUSH
125 static inline void huge_ptep_set_wrorptect_flush(struct vm_area_struct *vma,
126 unsigned long addr,
127 pte_t *ptep)
128 {
> 129 huge_ptep_set_wrprotect(vma->vm_mm, addr, ptep);
130 }
131 #endif
132

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (3.06 kB)
.config.gz (52.12 kB)
Download all attachments

2018-06-08 05:17:27

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 5/9] x86/mm: Introduce _PAGE_DIRTY_SW

Hi Yu-cheng,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on asm-generic/master]
[also build test WARNING on v4.17 next-20180607]
[cannot apply to tip/x86/core mmotm/master]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/Yu-cheng-Yu/Control-Flow-Enforcement-Part-2/20180608-111152
base: https://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic.git master
config: i386-randconfig-x003-201822 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386

All warnings (new ones prefixed by >>):

In file included from arch/x86/include/asm/current.h:5:0,
from include/linux/sched.h:12,
from include/linux/context_tracking.h:5,
from arch/x86/kernel/traps.c:15:
arch/x86/kernel/traps.c: In function 'do_control_protection':
arch/x86/kernel/traps.c:605:27: error: 'X86_FEATURE_SHSTK' undeclared (first use in this function); did you mean 'X86_FEATURE_EST'?
if (!cpu_feature_enabled(X86_FEATURE_SHSTK) &&
^
include/linux/compiler.h:58:30: note: in definition of macro '__trace_if'
if (__builtin_constant_p(!!(cond)) ? !!(cond) : \
^~~~
>> arch/x86/kernel/traps.c:605:2: note: in expansion of macro 'if'
if (!cpu_feature_enabled(X86_FEATURE_SHSTK) &&
^~
>> arch/x86/kernel/traps.c:605:7: note: in expansion of macro 'cpu_feature_enabled'
if (!cpu_feature_enabled(X86_FEATURE_SHSTK) &&
^~~~~~~~~~~~~~~~~~~
arch/x86/kernel/traps.c:605:27: note: each undeclared identifier is reported only once for each function it appears in
if (!cpu_feature_enabled(X86_FEATURE_SHSTK) &&
^
include/linux/compiler.h:58:30: note: in definition of macro '__trace_if'
if (__builtin_constant_p(!!(cond)) ? !!(cond) : \
^~~~
>> arch/x86/kernel/traps.c:605:2: note: in expansion of macro 'if'
if (!cpu_feature_enabled(X86_FEATURE_SHSTK) &&
^~
>> arch/x86/kernel/traps.c:605:7: note: in expansion of macro 'cpu_feature_enabled'
if (!cpu_feature_enabled(X86_FEATURE_SHSTK) &&
^~~~~~~~~~~~~~~~~~~
arch/x86/kernel/traps.c:606:27: error: 'X86_FEATURE_IBT' undeclared (first use in this function); did you mean 'X86_FEATURE_IBS'?
!cpu_feature_enabled(X86_FEATURE_IBT)) {
^
include/linux/compiler.h:58:30: note: in definition of macro '__trace_if'
if (__builtin_constant_p(!!(cond)) ? !!(cond) : \
^~~~
>> arch/x86/kernel/traps.c:605:2: note: in expansion of macro 'if'
if (!cpu_feature_enabled(X86_FEATURE_SHSTK) &&
^~
arch/x86/kernel/traps.c:606:7: note: in expansion of macro 'cpu_feature_enabled'
!cpu_feature_enabled(X86_FEATURE_IBT)) {
^~~~~~~~~~~~~~~~~~~

vim +/if +605 arch/x86/kernel/traps.c

a74a6de2 Yu-cheng Yu 2018-06-07 589
a74a6de2 Yu-cheng Yu 2018-06-07 590 /*
a74a6de2 Yu-cheng Yu 2018-06-07 591 * When a control protection exception occurs, send a signal
a74a6de2 Yu-cheng Yu 2018-06-07 592 * to the responsible application. Currently, control
a74a6de2 Yu-cheng Yu 2018-06-07 593 * protection is only enabled for the user mode. This
a74a6de2 Yu-cheng Yu 2018-06-07 594 * exception should not come from the kernel mode.
a74a6de2 Yu-cheng Yu 2018-06-07 595 */
a74a6de2 Yu-cheng Yu 2018-06-07 596 dotraplinkage void
a74a6de2 Yu-cheng Yu 2018-06-07 597 do_control_protection(struct pt_regs *regs, long error_code)
a74a6de2 Yu-cheng Yu 2018-06-07 598 {
a74a6de2 Yu-cheng Yu 2018-06-07 599 struct task_struct *tsk;
a74a6de2 Yu-cheng Yu 2018-06-07 600
a74a6de2 Yu-cheng Yu 2018-06-07 601 RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
a74a6de2 Yu-cheng Yu 2018-06-07 602 cond_local_irq_enable(regs);
a74a6de2 Yu-cheng Yu 2018-06-07 603
a74a6de2 Yu-cheng Yu 2018-06-07 604 tsk = current;
a74a6de2 Yu-cheng Yu 2018-06-07 @605 if (!cpu_feature_enabled(X86_FEATURE_SHSTK) &&
a74a6de2 Yu-cheng Yu 2018-06-07 606 !cpu_feature_enabled(X86_FEATURE_IBT)) {
a74a6de2 Yu-cheng Yu 2018-06-07 607 goto exit;
a74a6de2 Yu-cheng Yu 2018-06-07 608 }
a74a6de2 Yu-cheng Yu 2018-06-07 609
a74a6de2 Yu-cheng Yu 2018-06-07 610 if (!user_mode(regs)) {
a74a6de2 Yu-cheng Yu 2018-06-07 611 tsk->thread.error_code = error_code;
a74a6de2 Yu-cheng Yu 2018-06-07 612 tsk->thread.trap_nr = X86_TRAP_CP;
a74a6de2 Yu-cheng Yu 2018-06-07 613 if (notify_die(DIE_TRAP, "control protection fault", regs,
a74a6de2 Yu-cheng Yu 2018-06-07 614 error_code, X86_TRAP_CP, SIGSEGV) != NOTIFY_STOP)
a74a6de2 Yu-cheng Yu 2018-06-07 615 die("control protection fault", regs, error_code);
a74a6de2 Yu-cheng Yu 2018-06-07 616 return;
a74a6de2 Yu-cheng Yu 2018-06-07 617 }
a74a6de2 Yu-cheng Yu 2018-06-07 618
a74a6de2 Yu-cheng Yu 2018-06-07 619 tsk->thread.error_code = error_code;
a74a6de2 Yu-cheng Yu 2018-06-07 620 tsk->thread.trap_nr = X86_TRAP_CP;
a74a6de2 Yu-cheng Yu 2018-06-07 621
a74a6de2 Yu-cheng Yu 2018-06-07 622 if (show_unhandled_signals && unhandled_signal(tsk, SIGSEGV) &&
a74a6de2 Yu-cheng Yu 2018-06-07 623 printk_ratelimit()) {
a74a6de2 Yu-cheng Yu 2018-06-07 624 unsigned int max_idx, err_idx;
a74a6de2 Yu-cheng Yu 2018-06-07 625
a74a6de2 Yu-cheng Yu 2018-06-07 626 max_idx = ARRAY_SIZE(control_protection_err) - 1;
a74a6de2 Yu-cheng Yu 2018-06-07 627 err_idx = min((unsigned int)error_code - 1, max_idx);
a74a6de2 Yu-cheng Yu 2018-06-07 628 pr_info("%s[%d] control protection ip:%lx sp:%lx error:%lx(%s)",
a74a6de2 Yu-cheng Yu 2018-06-07 629 tsk->comm, task_pid_nr(tsk),
a74a6de2 Yu-cheng Yu 2018-06-07 630 regs->ip, regs->sp, error_code,
a74a6de2 Yu-cheng Yu 2018-06-07 631 control_protection_err[err_idx]);
a74a6de2 Yu-cheng Yu 2018-06-07 632 print_vma_addr(" in ", regs->ip);
a74a6de2 Yu-cheng Yu 2018-06-07 633 pr_cont("\n");
a74a6de2 Yu-cheng Yu 2018-06-07 634 }
a74a6de2 Yu-cheng Yu 2018-06-07 635
a74a6de2 Yu-cheng Yu 2018-06-07 636 exit:
a74a6de2 Yu-cheng Yu 2018-06-07 637 force_sig_info(SIGSEGV, SEND_SIG_PRIV, tsk);
a74a6de2 Yu-cheng Yu 2018-06-07 638 }
a74a6de2 Yu-cheng Yu 2018-06-07 639 NOKPROBE_SYMBOL(do_control_protection);
a74a6de2 Yu-cheng Yu 2018-06-07 640

:::::: The code at line 605 was first introduced by commit
:::::: a74a6de2a3290257798598ae1f816eddb04f63f2 x86/cet: Control protection exception handler

:::::: TO: Yu-cheng Yu <[email protected]>
:::::: CC: 0day robot <[email protected]>

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (6.92 kB)
.config.gz (32.82 kB)
Download all attachments

2018-06-08 06:00:14

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 6/9] x86/mm: Introduce ptep_set_wrprotect_flush and related functions

Hi Yu-cheng,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on asm-generic/master]
[also build test ERROR on v4.17 next-20180607]
[cannot apply to tip/x86/core mmotm/master]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/Yu-cheng-Yu/Control-Flow-Enforcement-Part-2/20180608-111152
base: https://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic.git master
config: xtensa-allmodconfig (attached as .config)
compiler: xtensa-linux-gcc (GCC) 8.1.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=xtensa

All errors (new ones prefixed by >>):

In file included from arch/xtensa/include/asm/pgtable.h:444,
from include/linux/memremap.h:8,
from include/linux/mm.h:27,
from include/linux/pid_namespace.h:7,
from include/linux/ptrace.h:10,
from arch/xtensa/kernel/asm-offsets.c:21:
include/asm-generic/pgtable.h: In function 'huge_ptep_set_wrorptect_flush':
>> include/asm-generic/pgtable.h:129:2: error: implicit declaration of function 'huge_ptep_set_wrprotect'; did you mean 'ptep_set_wrprotect'? [-Werror=implicit-function-declaration]
huge_ptep_set_wrprotect(vma->vm_mm, addr, ptep);
^~~~~~~~~~~~~~~~~~~~~~~
ptep_set_wrprotect
cc1: some warnings being treated as errors
make[2]: *** [arch/xtensa/kernel/asm-offsets.s] Error 1
make[2]: Target '__build' not remade because of errors.
make[1]: *** [prepare0] Error 2
make[1]: Target 'prepare' not remade because of errors.
make: *** [sub-make] Error 2

vim +129 include/asm-generic/pgtable.h

123
124 #ifndef __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT_FLUSH
125 static inline void huge_ptep_set_wrorptect_flush(struct vm_area_struct *vma,
126 unsigned long addr,
127 pte_t *ptep)
128 {
> 129 huge_ptep_set_wrprotect(vma->vm_mm, addr, ptep);
130 }
131 #endif
132

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (2.37 kB)
.config.gz (51.95 kB)
Download all attachments