LinuxLists.cc - [PATCH v2 0/6] mm, x86/vdso: Special IO mapping improvements

2015-12-14 18:31:26

Subject: [PATCH v2 0/6] mm, x86/vdso: Special IO mapping improvements

This applies on top of the earlier vdso pvclock series I sent out.
Once that lands in -tip, this will apply to -tip.

This series cleans up the hack that is our vvar mapping. We currently
initialize the vvar mapping as a special mapping vma backed by nothing
whatsoever and then we abuse remap_pfn_range to populate it.

This cheats the mm core, probably breaks under various evil madvise
workloads, and prevents handling faults in more interesting ways.

To clean it up, this series:

- Adds a special mapping .fault operation
- Adds a vm_insert_pfn_prot helper
- Uses the new .fault infrastructure in x86's vdso and vvar mappings
- Hardens the HPET mapping, mitigating an HW attack surface that bothers me

akpm, can you ack patck 1?

Changes from v1:
- Lots of changelog clarification requested by akpm
- Minor tweaks to style and comments in the first two patches

Andy Lutomirski (6):
mm: Add a vm_special_mapping .fault method
mm: Add vm_insert_pfn_prot
x86/vdso: Track each mm's loaded vdso image as well as its base
x86,vdso: Use .fault for the vdso text mapping
x86,vdso: Use .fault instead of remap_pfn_range for the vvar mapping
x86/vdso: Disallow vvar access to vclock IO for never-used vclocks

arch/x86/entry/vdso/vdso2c.h | 7 --
arch/x86/entry/vdso/vma.c | 124 ++++++++++++++++++++------------
arch/x86/entry/vsyscall/vsyscall_gtod.c | 9 ++-
arch/x86/include/asm/clocksource.h | 9 +--
arch/x86/include/asm/mmu.h | 3 +-
arch/x86/include/asm/vdso.h | 3 -
arch/x86/include/asm/vgtod.h | 6 ++
include/linux/mm.h | 2 +
include/linux/mm_types.h | 22 +++++-
mm/memory.c | 25 ++++++-
mm/mmap.c | 13 ++--
11 files changed, 151 insertions(+), 72 deletions(-)

--
2.5.0

2015-12-14 18:33:41

by Andy Lutomirski

[permalink] [raw]

Subject: [PATCH v2 1/6] mm: Add a vm_special_mapping .fault method

From: Andy Lutomirski <[email protected]>

Requiring special mappings to give a list of struct pages is
inflexible: it prevents sane use of IO memory in a special mapping,
it's inefficient (it requires arch code to initialize a list of
struct pages, and it requires the mm core to walk the entire list
just to figure out how long it is), and it prevents arch code from
doing anything fancy when a special mapping fault occurs.

Add a .fault method as an alternative to filling in a .pages array.

Looks-OK-to: Andrew Morton <[email protected]>
Signed-off-by: Andy Lutomirski <[email protected]>
---

Notes:
Chages from v1:
- Fixed "struct vm_special_mapping" code layout (akpm)
- s/is// (akpm)

include/linux/mm_types.h | 22 +++++++++++++++++++---
mm/mmap.c | 13 +++++++++----
2 files changed, 28 insertions(+), 7 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index f8d1492a114f..c88e48a3c155 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -568,10 +568,26 @@ static inline void clear_tlb_flush_pending(struct mm_struct *mm)
}
#endif

-struct vm_special_mapping
-{
- const char *name;
+struct vm_fault;
+
+struct vm_special_mapping {
+ const char *name; /* The name, e.g. "[vdso]". */
+
+ /*
+ * If .fault is not provided, this points to a
+ * NULL-terminated array of pages that back the special mapping.
+ *
+ * This must not be NULL unless .fault is provided.
+ */
struct page **pages;
+
+ /*
+ * If non-NULL, then this is called to resolve page faults
+ * on the special mapping. If used, .pages is not checked.
+ */
+ int (*fault)(const struct vm_special_mapping *sm,
+ struct vm_area_struct *vma,
+ struct vm_fault *vmf);
};

enum tlb_flush_reason {
diff --git a/mm/mmap.c b/mm/mmap.c
index 2ce04a649f6b..f717453b1a57 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3030,11 +3030,16 @@ static int special_mapping_fault(struct vm_area_struct *vma,
pgoff_t pgoff;
struct page **pages;

- if (vma->vm_ops == &legacy_special_mapping_vmops)
+ if (vma->vm_ops == &legacy_special_mapping_vmops) {
pages = vma->vm_private_data;
- else
- pages = ((struct vm_special_mapping *)vma->vm_private_data)->
- pages;
+ } else {
+ struct vm_special_mapping *sm = vma->vm_private_data;
+
+ if (sm->fault)
+ return sm->fault(sm, vma, vmf);
+
+ pages = sm->pages;
+ }

for (pgoff = vmf->pgoff; pgoff && *pages; ++pages)
pgoff--;
--
2.5.0

2015-12-14 18:31:29

by Andy Lutomirski

[permalink] [raw]

Subject: [PATCH v2 2/6] mm: Add vm_insert_pfn_prot

The x86 vvar vma conntains pages with differing cacheability
flags. x86 currently implements this by manually inserting all the ptes
using (io_)remap_pfn_range when the vma is set up.

x86 wants to move to using .fault with VM_FAULT_NOPAGE to set up the
mappings as needed. The correct API to use to insert a pfn in
.fault is vm_insert_pfn, but vm_insert_pfn can't override the vma's
cache mode, and the HPET page in particular needs to be uncached
despite the fact that the rest of the VMA is cached.

Add vm_insert_pfn_prot to support varying cacheability within the
same non-COW VMA in a more sane manner.

x86 could alternatively use multiple VMAs, but that's messy, would
break CRIU, and would create unnecessary VMAs that would waste
memory.

Acked-by: Andrew Morton <[email protected]>
Signed-off-by: Andy Lutomirski <[email protected]>
---

Notes:
Changes from v1:
- Improve the changelog (akpm)

include/linux/mm.h | 2 ++
mm/memory.c | 25 +++++++++++++++++++++++--
2 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 00bad7793788..87ef1d7730ba 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2080,6 +2080,8 @@ int remap_pfn_range(struct vm_area_struct *, unsigned long addr,
int vm_insert_page(struct vm_area_struct *, unsigned long addr, struct page *);
int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr,
unsigned long pfn);
+int vm_insert_pfn_prot(struct vm_area_struct *vma, unsigned long addr,
+ unsigned long pfn, pgprot_t pgprot);
int vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr,
unsigned long pfn);
int vm_iomap_memory(struct vm_area_struct *vma, phys_addr_t start, unsigned long len);
diff --git a/mm/memory.c b/mm/memory.c
index c387430f06c3..a29f0b90fc56 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1564,8 +1564,29 @@ out:
int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr,
unsigned long pfn)
{
+ return vm_insert_pfn_prot(vma, addr, pfn, vma->vm_page_prot);
+}
+EXPORT_SYMBOL(vm_insert_pfn);
+
+/**
+ * vm_insert_pfn_prot - insert single pfn into user vma with specified pgprot
+ * @vma: user vma to map to
+ * @addr: target user address of this page
+ * @pfn: source kernel pfn
+ * @pgprot: pgprot flags for the inserted page
+ *
+ * This is exactly like vm_insert_pfn, except that it allows drivers to
+ * to override pgprot on a per-page basis.
+ *
+ * This only makes sense for IO mappings, and it makes no sense for
+ * cow mappings. In general, using multiple vmas is preferable;
+ * vm_insert_pfn_prot should only be used if using multiple VMAs is
+ * impractical.
+ */
+int vm_insert_pfn_prot(struct vm_area_struct *vma, unsigned long addr,
+ unsigned long pfn, pgprot_t pgprot)
+{
int ret;
- pgprot_t pgprot = vma->vm_page_prot;
/*
* Technically, architectures with pte_special can avoid all these
* restrictions (same for remap_pfn_range). However we would like
@@ -1587,7 +1608,7 @@ int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr,

return ret;
}
-EXPORT_SYMBOL(vm_insert_pfn);
+EXPORT_SYMBOL(vm_insert_pfn_prot);

int vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr,
unsigned long pfn)
--
2.5.0

2015-12-14 18:33:18

by Andy Lutomirski

[permalink] [raw]

Subject: [PATCH v2 3/6] x86/vdso: Track each mm's loaded vdso image as well as its base

As we start to do more intelligent things with the vdso at runtime
(as opposed to just at mm initialization time), we'll need to know
which vdso is in use.

In principle, we could guess based on the mm type, but that's
over-complicated and error-prone. Instead, just track it in the mmu
context.

Signed-off-by: Andy Lutomirski <[email protected]>
---
arch/x86/entry/vdso/vma.c | 1 +
arch/x86/include/asm/mmu.h | 3 ++-
2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index b8f69e264ac4..80b021067bd6 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -121,6 +121,7 @@ static int map_vdso(const struct vdso_image *image, bool calculate_addr)

text_start = addr - image->sym_vvar_start;
current->mm->context.vdso = (void __user *)text_start;
+ current->mm->context.vdso_image = image;

/*
* MAYWRITE to allow gdb to COW and set breakpoints
diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index 55234d5e7160..1ea0baef1175 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -19,7 +19,8 @@ typedef struct {
#endif

struct mutex lock;
- void __user *vdso;
+ void __user *vdso; /* vdso base address */
+ const struct vdso_image *vdso_image; /* vdso image in use */

atomic_t perf_rdpmc_allowed; /* nonzero if rdpmc is allowed */
} mm_context_t;
--
2.5.0

2015-12-14 18:31:32

by Andy Lutomirski

[permalink] [raw]

Subject: [PATCH v2 4/6] x86,vdso: Use .fault for the vdso text mapping

The old scheme for mapping the vdso text is rather complicated. vdso2c
generates a struct vm_special_mapping and a blank .pages array of the
correct size for each vdso image. Init code in vdso/vma.c populates
the .pages array for each vdso image, and the mapping code selects
the appropriate struct vm_special_mapping.

With .fault, we can use a less roundabout approach: vdso_fault
just returns the appropriate page for the selected vdso image.

Signed-off-by: Andy Lutomirski <[email protected]>
---
arch/x86/entry/vdso/vdso2c.h | 7 -------
arch/x86/entry/vdso/vma.c | 26 +++++++++++++++++++-------
arch/x86/include/asm/vdso.h | 3 ---
3 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/arch/x86/entry/vdso/vdso2c.h b/arch/x86/entry/vdso/vdso2c.h
index 0224987556ce..abe961c7c71c 100644
--- a/arch/x86/entry/vdso/vdso2c.h
+++ b/arch/x86/entry/vdso/vdso2c.h
@@ -150,16 +150,9 @@ static void BITSFUNC(go)(void *raw_addr, size_t raw_len,
}
fprintf(outfile, "\n};\n\n");

- fprintf(outfile, "static struct page *pages[%lu];\n\n",
- mapping_size / 4096);
-
fprintf(outfile, "const struct vdso_image %s = {\n", name);
fprintf(outfile, "\t.data = raw_data,\n");
fprintf(outfile, "\t.size = %lu,\n", mapping_size);
- fprintf(outfile, "\t.text_mapping = {\n");
- fprintf(outfile, "\t\t.name = \"[vdso]\",\n");
- fprintf(outfile, "\t\t.pages = pages,\n");
- fprintf(outfile, "\t},\n");
if (alt_sec) {
fprintf(outfile, "\t.alt = %lu,\n",
(unsigned long)GET_LE(&alt_sec->sh_offset));
diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 80b021067bd6..eb50d7c1f161 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -27,13 +27,7 @@ unsigned int __read_mostly vdso64_enabled = 1;

void __init init_vdso_image(const struct vdso_image *image)
{
- int i;
- int npages = (image->size) / PAGE_SIZE;
-
BUG_ON(image->size % PAGE_SIZE != 0);
- for (i = 0; i < npages; i++)
- image->text_mapping.pages[i] =
- virt_to_page(image->data + i*PAGE_SIZE);

apply_alternatives((struct alt_instr *)(image->data + image->alt),
(struct alt_instr *)(image->data + image->alt +
@@ -90,6 +84,24 @@ static unsigned long vdso_addr(unsigned long start, unsigned len)
#endif
}

+static int vdso_fault(const struct vm_special_mapping *sm,
+ struct vm_area_struct *vma, struct vm_fault *vmf)
+{
+ const struct vdso_image *image = vma->vm_mm->context.vdso_image;
+
+ if (!image || (vmf->pgoff << PAGE_SHIFT) >= image->size)
+ return VM_FAULT_SIGBUS;
+
+ vmf->page = virt_to_page(image->data + (vmf->pgoff << PAGE_SHIFT));
+ get_page(vmf->page);
+ return 0;
+}
+
+static const struct vm_special_mapping text_mapping = {
+ .name = "[vdso]",
+ .fault = vdso_fault,
+};
+
static int map_vdso(const struct vdso_image *image, bool calculate_addr)
{
struct mm_struct *mm = current->mm;
@@ -131,7 +143,7 @@ static int map_vdso(const struct vdso_image *image, bool calculate_addr)
image->size,
VM_READ|VM_EXEC|
VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
- &image->text_mapping);
+ &text_mapping);

if (IS_ERR(vma)) {
ret = PTR_ERR(vma);
diff --git a/arch/x86/include/asm/vdso.h b/arch/x86/include/asm/vdso.h
index deabaf9759b6..43dc55be524e 100644
--- a/arch/x86/include/asm/vdso.h
+++ b/arch/x86/include/asm/vdso.h
@@ -13,9 +13,6 @@ struct vdso_image {
void *data;
unsigned long size; /* Always a multiple of PAGE_SIZE */

- /* text_mapping.pages is big enough for data/size page pointers */
- struct vm_special_mapping text_mapping;
-
unsigned long alt, alt_len;

long sym_vvar_start; /* Negative offset to the vvar area */
--
2.5.0

2015-12-14 18:32:03

by Andy Lutomirski

[permalink] [raw]

Subject: [PATCH v2 5/6] x86,vdso: Use .fault instead of remap_pfn_range for the vvar mapping

This is IMO much less ugly, and it also opens the door to
disallowing unprivileged userspace HPET access on systems with
usable TSCs.

Signed-off-by: Andy Lutomirski <[email protected]>
---
arch/x86/entry/vdso/vma.c | 97 ++++++++++++++++++++++++++++-------------------
1 file changed, 57 insertions(+), 40 deletions(-)

diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index eb50d7c1f161..02221e98b83f 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -102,18 +102,69 @@ static const struct vm_special_mapping text_mapping = {
.fault = vdso_fault,
};

+static int vvar_fault(const struct vm_special_mapping *sm,
+ struct vm_area_struct *vma, struct vm_fault *vmf)
+{
+ const struct vdso_image *image = vma->vm_mm->context.vdso_image;
+ long sym_offset;
+ int ret = -EFAULT;
+
+ if (!image)
+ return VM_FAULT_SIGBUS;
+
+ sym_offset = (long)(vmf->pgoff << PAGE_SHIFT) +
+ image->sym_vvar_start;
+
+ /*
+ * Sanity check: a symbol offset of zero means that the page
+ * does not exist for this vdso image, not that the page is at
+ * offset zero relative to the text mapping. This should be
+ * impossible here, because sym_offset should only be zero for
+ * the page past the end of the vvar mapping.
+ */
+ if (sym_offset == 0)
+ return VM_FAULT_SIGBUS;
+
+ if (sym_offset == image->sym_vvar_page) {
+ ret = vm_insert_pfn(vma, (unsigned long)vmf->virtual_address,
+ __pa_symbol(&__vvar_page) >> PAGE_SHIFT);
+ } else if (sym_offset == image->sym_hpet_page) {
+#ifdef CONFIG_HPET_TIMER
+ if (hpet_address) {
+ ret = vm_insert_pfn_prot(
+ vma,
+ (unsigned long)vmf->virtual_address,
+ hpet_address >> PAGE_SHIFT,
+ pgprot_noncached(PAGE_READONLY));
+ }
+#endif
+ } else if (sym_offset == image->sym_pvclock_page) {
+ struct pvclock_vsyscall_time_info *pvti =
+ pvclock_pvti_cpu0_va();
+ if (pvti) {
+ ret = vm_insert_pfn(
+ vma,
+ (unsigned long)vmf->virtual_address,
+ __pa(pvti) >> PAGE_SHIFT);
+ }
+ }
+
+ if (ret == 0)
+ return VM_FAULT_NOPAGE;
+
+ return VM_FAULT_SIGBUS;
+}
+
static int map_vdso(const struct vdso_image *image, bool calculate_addr)
{
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
unsigned long addr, text_start;
int ret = 0;
- static struct page *no_pages[] = {NULL};
- static struct vm_special_mapping vvar_mapping = {
+ static const struct vm_special_mapping vvar_mapping = {
.name = "[vvar]",
- .pages = no_pages,
+ .fault = vvar_fault,
};
- struct pvclock_vsyscall_time_info *pvti;

if (calculate_addr) {
addr = vdso_addr(current->mm->start_stack,
@@ -153,7 +204,8 @@ static int map_vdso(const struct vdso_image *image, bool calculate_addr)
vma = _install_special_mapping(mm,
addr,
-image->sym_vvar_start,
- VM_READ|VM_MAYREAD,
+ VM_READ|VM_MAYREAD|VM_IO|VM_DONTDUMP|
+ VM_PFNMAP,
&vvar_mapping);

if (IS_ERR(vma)) {
@@ -161,41 +213,6 @@ static int map_vdso(const struct vdso_image *image, bool calculate_addr)
goto up_fail;
}

- if (image->sym_vvar_page)
- ret = remap_pfn_range(vma,
- text_start + image->sym_vvar_page,
- __pa_symbol(&__vvar_page) >> PAGE_SHIFT,
- PAGE_SIZE,
- PAGE_READONLY);
-
- if (ret)
- goto up_fail;
-
-#ifdef CONFIG_HPET_TIMER
- if (hpet_address && image->sym_hpet_page) {
- ret = io_remap_pfn_range(vma,
- text_start + image->sym_hpet_page,
- hpet_address >> PAGE_SHIFT,
- PAGE_SIZE,
- pgprot_noncached(PAGE_READONLY));
-
- if (ret)
- goto up_fail;
- }
-#endif
-
- pvti = pvclock_pvti_cpu0_va();
- if (pvti && image->sym_pvclock_page) {
- ret = remap_pfn_range(vma,
- text_start + image->sym_pvclock_page,
- __pa(pvti) >> PAGE_SHIFT,
- PAGE_SIZE,
- PAGE_READONLY);
-
- if (ret)
- goto up_fail;
- }
-
up_fail:
if (ret)
current->mm->context.vdso = NULL;
--
2.5.0

2015-12-14 18:31:36

by Andy Lutomirski

[permalink] [raw]

Subject: [PATCH v2 6/6] x86/vdso: Disallow vvar access to vclock IO for never-used vclocks

It makes me uncomfortable that even modern systems grant every
process direct read access to the HPET.

While fixing this for real without regressing anything is a mess
(unmapping the HPET is tricky because we don't adequately track all
the mappings), we can do almost as well by tracking which vclocks
have ever been used and only allowing pages associated with used
vclocks to be faulted in.

This will cause rogue programs that try to peek at the HPET to get
SIGBUS instead on most systems.

We can't restrict faults to vclock pages that are associated with
the currently selected vclock due to a race: a process could start
to access the HPET for the first time and race against a switch away
from the HPET as the current clocksource. We can't segfault the
process trying to peek at the HPET in this case, even though the
process isn't going to do anything useful with the data.

Signed-off-by: Andy Lutomirski <[email protected]>
---
arch/x86/entry/vdso/vma.c | 4 ++--
arch/x86/entry/vsyscall/vsyscall_gtod.c | 9 ++++++++-
arch/x86/include/asm/clocksource.h | 9 +++++----
arch/x86/include/asm/vgtod.h | 6 ++++++
4 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 02221e98b83f..aa4f5b99f1f3 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -130,7 +130,7 @@ static int vvar_fault(const struct vm_special_mapping *sm,
__pa_symbol(&__vvar_page) >> PAGE_SHIFT);
} else if (sym_offset == image->sym_hpet_page) {
#ifdef CONFIG_HPET_TIMER
- if (hpet_address) {
+ if (hpet_address && vclock_was_used(VCLOCK_HPET)) {
ret = vm_insert_pfn_prot(
vma,
(unsigned long)vmf->virtual_address,
@@ -141,7 +141,7 @@ static int vvar_fault(const struct vm_special_mapping *sm,
} else if (sym_offset == image->sym_pvclock_page) {
struct pvclock_vsyscall_time_info *pvti =
pvclock_pvti_cpu0_va();
- if (pvti) {
+ if (pvti && vclock_was_used(VCLOCK_PVCLOCK)) {
ret = vm_insert_pfn(
vma,
(unsigned long)vmf->virtual_address,
diff --git a/arch/x86/entry/vsyscall/vsyscall_gtod.c b/arch/x86/entry/vsyscall/vsyscall_gtod.c
index 51e330416995..0fb3a104ac62 100644
--- a/arch/x86/entry/vsyscall/vsyscall_gtod.c
+++ b/arch/x86/entry/vsyscall/vsyscall_gtod.c
@@ -16,6 +16,8 @@
#include <asm/vgtod.h>
#include <asm/vvar.h>

+int vclocks_used __read_mostly;
+
DEFINE_VVAR(struct vsyscall_gtod_data, vsyscall_gtod_data);

void update_vsyscall_tz(void)
@@ -26,12 +28,17 @@ void update_vsyscall_tz(void)

void update_vsyscall(struct timekeeper *tk)
{
+ int vclock_mode = tk->tkr_mono.clock->archdata.vclock_mode;
struct vsyscall_gtod_data *vdata = &vsyscall_gtod_data;

+ /* Mark the new vclock used. */
+ BUILD_BUG_ON(VCLOCK_MAX >= 32);
+ WRITE_ONCE(vclocks_used, READ_ONCE(vclocks_used) | (1 << vclock_mode));
+
gtod_write_begin(vdata);

/* copy vsyscall data */
- vdata->vclock_mode = tk->tkr_mono.clock->archdata.vclock_mode;
+ vdata->vclock_mode = vclock_mode;
vdata->cycle_last = tk->tkr_mono.cycle_last;
vdata->mask = tk->tkr_mono.mask;
vdata->mult = tk->tkr_mono.mult;
diff --git a/arch/x86/include/asm/clocksource.h b/arch/x86/include/asm/clocksource.h
index eda81dc0f4ae..d194266acb28 100644
--- a/arch/x86/include/asm/clocksource.h
+++ b/arch/x86/include/asm/clocksource.h
@@ -3,10 +3,11 @@
#ifndef _ASM_X86_CLOCKSOURCE_H
#define _ASM_X86_CLOCKSOURCE_H

-#define VCLOCK_NONE 0 /* No vDSO clock available. */
-#define VCLOCK_TSC 1 /* vDSO should use vread_tsc. */
-#define VCLOCK_HPET 2 /* vDSO should use vread_hpet. */
-#define VCLOCK_PVCLOCK 3 /* vDSO should use vread_pvclock. */
+#define VCLOCK_NONE 0 /* No vDSO clock available. */
+#define VCLOCK_TSC 1 /* vDSO should use vread_tsc. */
+#define VCLOCK_HPET 2 /* vDSO should use vread_hpet. */
+#define VCLOCK_PVCLOCK 3 /* vDSO should use vread_pvclock. */
+#define VCLOCK_MAX 3

struct arch_clocksource_data {
int vclock_mode;
diff --git a/arch/x86/include/asm/vgtod.h b/arch/x86/include/asm/vgtod.h
index f556c4843aa1..e728699db774 100644
--- a/arch/x86/include/asm/vgtod.h
+++ b/arch/x86/include/asm/vgtod.h
@@ -37,6 +37,12 @@ struct vsyscall_gtod_data {
};
extern struct vsyscall_gtod_data vsyscall_gtod_data;

+extern int vclocks_used;
+static inline bool vclock_was_used(int vclock)
+{
+ return READ_ONCE(vclocks_used) & (1 << vclock);
+}
+
static inline unsigned gtod_read_begin(const struct vsyscall_gtod_data *s)
{
unsigned ret;
--
2.5.0

2015-12-23 23:57:13

by Andy Lutomirski

[permalink] [raw]

Subject: Re: [PATCH v2 0/6] mm, x86/vdso: Special IO mapping improvements

Hi Oleg and Kees-

I meant to cc you on this in the first place, but I failed. If you
have a few minutes, want to take a peek at these and see if you can
poke any holes in them? I'm reasonably confident that they're a
considerable improvement over the old state of affairs, but they might
still not be perfect.

Let me know if you want me to email out a fresh copy. This series
applies to tip:x86/asm.

--Andy

On Mon, Dec 14, 2015 at 10:31 AM, Andy Lutomirski <[email protected]> wrote:
> This applies on top of the earlier vdso pvclock series I sent out.
> Once that lands in -tip, this will apply to -tip.
>
> This series cleans up the hack that is our vvar mapping. We currently
> initialize the vvar mapping as a special mapping vma backed by nothing
> whatsoever and then we abuse remap_pfn_range to populate it.
>
> This cheats the mm core, probably breaks under various evil madvise
> workloads, and prevents handling faults in more interesting ways.
>
> To clean it up, this series:
>
> - Adds a special mapping .fault operation
> - Adds a vm_insert_pfn_prot helper
> - Uses the new .fault infrastructure in x86's vdso and vvar mappings
> - Hardens the HPET mapping, mitigating an HW attack surface that bothers me
>
> akpm, can you ack patck 1?
>
> Changes from v1:
> - Lots of changelog clarification requested by akpm
> - Minor tweaks to style and comments in the first two patches
>
> Andy Lutomirski (6):
> mm: Add a vm_special_mapping .fault method
> mm: Add vm_insert_pfn_prot
> x86/vdso: Track each mm's loaded vdso image as well as its base
> x86,vdso: Use .fault for the vdso text mapping
> x86,vdso: Use .fault instead of remap_pfn_range for the vvar mapping
> x86/vdso: Disallow vvar access to vclock IO for never-used vclocks
>
> arch/x86/entry/vdso/vdso2c.h | 7 --
> arch/x86/entry/vdso/vma.c | 124 ++++++++++++++++++++------------
> arch/x86/entry/vsyscall/vsyscall_gtod.c | 9 ++-
> arch/x86/include/asm/clocksource.h | 9 +--
> arch/x86/include/asm/mmu.h | 3 +-
> arch/x86/include/asm/vdso.h | 3 -
> arch/x86/include/asm/vgtod.h | 6 ++
> include/linux/mm.h | 2 +
> include/linux/mm_types.h | 22 +++++-
> mm/memory.c | 25 ++++++-
> mm/mmap.c | 13 ++--
> 11 files changed, 151 insertions(+), 72 deletions(-)
>
> --
> 2.5.0
>

--
Andy Lutomirski
AMA Capital Management, LLC

2015-12-29 13:13:12

by Andy Lutomirski

[permalink] [raw]

Subject: Re: [PATCH v2 0/6] mm, x86/vdso: Special IO mapping improvements

On Wed, Dec 23, 2015 at 3:56 PM, Andy Lutomirski <[email protected]> wrote:
> Hi Oleg and Kees-
>
> I meant to cc you on this in the first place, but I failed. If you
> have a few minutes, want to take a peek at these and see if you can
> poke any holes in them? I'm reasonably confident that they're a
> considerable improvement over the old state of affairs, but they might
> still not be perfect.
>
> Let me know if you want me to email out a fresh copy. This series
> applies to tip:x86/asm.

Hi -tip people:

please don't apply this series. It has a race. I'll send v3.

--Andy

>
> --Andy
>
> On Mon, Dec 14, 2015 at 10:31 AM, Andy Lutomirski <[email protected]> wrote:
>> This applies on top of the earlier vdso pvclock series I sent out.
>> Once that lands in -tip, this will apply to -tip.
>>
>> This series cleans up the hack that is our vvar mapping. We currently
>> initialize the vvar mapping as a special mapping vma backed by nothing
>> whatsoever and then we abuse remap_pfn_range to populate it.
>>
>> This cheats the mm core, probably breaks under various evil madvise
>> workloads, and prevents handling faults in more interesting ways.
>>
>> To clean it up, this series:
>>
>> - Adds a special mapping .fault operation
>> - Adds a vm_insert_pfn_prot helper
>> - Uses the new .fault infrastructure in x86's vdso and vvar mappings
>> - Hardens the HPET mapping, mitigating an HW attack surface that bothers me
>>
>> akpm, can you ack patck 1?
>>
>> Changes from v1:
>> - Lots of changelog clarification requested by akpm
>> - Minor tweaks to style and comments in the first two patches
>>
>> Andy Lutomirski (6):
>> mm: Add a vm_special_mapping .fault method
>> mm: Add vm_insert_pfn_prot
>> x86/vdso: Track each mm's loaded vdso image as well as its base
>> x86,vdso: Use .fault for the vdso text mapping
>> x86,vdso: Use .fault instead of remap_pfn_range for the vvar mapping
>> x86/vdso: Disallow vvar access to vclock IO for never-used vclocks
>>
>> arch/x86/entry/vdso/vdso2c.h | 7 --
>> arch/x86/entry/vdso/vma.c | 124 ++++++++++++++++++++------------
>> arch/x86/entry/vsyscall/vsyscall_gtod.c | 9 ++-
>> arch/x86/include/asm/clocksource.h | 9 +--
>> arch/x86/include/asm/mmu.h | 3 +-
>> arch/x86/include/asm/vdso.h | 3 -
>> arch/x86/include/asm/vgtod.h | 6 ++
>> include/linux/mm.h | 2 +
>> include/linux/mm_types.h | 22 +++++-
>> mm/memory.c | 25 ++++++-
>> mm/mmap.c | 13 ++--
>> 11 files changed, 151 insertions(+), 72 deletions(-)
>>
>> --
>> 2.5.0
>>
>
>
>
> --
> Andy Lutomirski
> AMA Capital Management, LLC

--
Andy Lutomirski
AMA Capital Management, LLC