2015-08-29 15:20:38

by Brian Gerst

[permalink] [raw]
Subject: [PATCH 0/7] x86 vdso32 cleanups

This patch set contains several cleanups to the 32-bit VDSO. The
main change is to only build one VDSO image, and select the syscall
entry point at runtime.

arch/x86/entry/vdso/.gitignore | 4 +---
arch/x86/entry/vdso/Makefile | 53 ++++++++++++++++++++++-------------------------------
arch/x86/entry/vdso/{vdso32 => }/int80.S | 13 +------------
arch/x86/entry/vdso/{vdso32 => }/sigreturn.S | 9 +++++++--
arch/x86/entry/vdso/{vdso32 => }/syscall.S | 23 +++++------------------
arch/x86/entry/vdso/{vdso32 => }/sysenter.S | 19 +++++--------------
arch/x86/entry/vdso/vclock_gettime.c | 31 +++++++++++++++++++++++++++++++
arch/x86/entry/vdso/vdso-note.S | 32 +++++++++++++++++++++++++++++++-
arch/x86/entry/vdso/vdso2c.c | 2 ++
arch/x86/entry/vdso/vdso32-setup.c | 15 ++++++++-------
arch/x86/entry/vdso/{vdso32 => }/vdso32.lds.S | 2 +-
arch/x86/entry/vdso/vdso32/.gitignore | 1 -
arch/x86/entry/vdso/vdso32/note.S | 44 --------------------------------------------
arch/x86/entry/vdso/vdso32/vclock_gettime.c | 30 ------------------------------
arch/x86/entry/vdso/vdso32/vdso-fakesections.c | 1 -
arch/x86/entry/vdso/vma.c | 6 +++---
arch/x86/ia32/ia32_signal.c | 4 ++--
arch/x86/include/asm/elf.h | 3 +--
arch/x86/include/asm/vdso.h | 20 +++++++++++++-------
arch/x86/kernel/signal.c | 4 ++--
arch/x86/xen/setup.c | 13 ++-----------
arch/x86/xen/vdso.h | 4 ----
22 files changed, 137 insertions(+), 196 deletions(-)


2015-08-29 15:22:54

by Brian Gerst

[permalink] [raw]
Subject: [PATCH 1/7] x86/vdso32: Separate sigreturn code

Compile a separate sigreturn.o instead of including it in the three
syscall entry stub files. Use alternatives to patch in a syscall
instruction when supported.

Signed-off-by: Brian Gerst <[email protected]>
---
arch/x86/entry/vdso/Makefile | 3 ++-
arch/x86/entry/vdso/vdso32/int80.S | 5 +----
arch/x86/entry/vdso/vdso32/sigreturn.S | 9 +++++++--
arch/x86/entry/vdso/vdso32/syscall.S | 7 +------
arch/x86/entry/vdso/vdso32/sysenter.S | 5 +----
5 files changed, 12 insertions(+), 17 deletions(-)

diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index a3d0767..b4cd431 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -140,7 +140,7 @@ override obj-dirs = $(dir $(obj)) $(obj)/vdso32/

targets += vdso32/vdso32.lds
targets += vdso32/note.o vdso32/vclock_gettime.o $(vdso32.so-y:%=vdso32/%.o)
-targets += vdso32/vclock_gettime.o
+targets += vdso32/vclock_gettime.o vdso32/sigreturn.o

$(obj)/vdso32.o: $(vdso32-images:%=$(obj)/%)

@@ -163,6 +163,7 @@ $(vdso32-images:%=$(obj)/%.dbg): $(obj)/vdso32-%.so.dbg: FORCE \
$(obj)/vdso32/vdso32.lds \
$(obj)/vdso32/vclock_gettime.o \
$(obj)/vdso32/note.o \
+ $(obj)/vdso32/sigreturn.o \
$(obj)/vdso32/%.o
$(call if_changed,vdso)

diff --git a/arch/x86/entry/vdso/vdso32/int80.S b/arch/x86/entry/vdso/vdso32/int80.S
index b15b7c0..e40af1c 100644
--- a/arch/x86/entry/vdso/vdso32/int80.S
+++ b/arch/x86/entry/vdso/vdso32/int80.S
@@ -1,10 +1,7 @@
/*
* Code for the vDSO. This version uses the old int $0x80 method.
- *
- * First get the common code for the sigreturn entry points.
- * This must come first.
*/
-#include "sigreturn.S"
+#include <linux/linkage.h>

.text
.globl __kernel_vsyscall
diff --git a/arch/x86/entry/vdso/vdso32/sigreturn.S b/arch/x86/entry/vdso/vdso32/sigreturn.S
index d7ec4e2..ca0e6ca 100644
--- a/arch/x86/entry/vdso/vdso32/sigreturn.S
+++ b/arch/x86/entry/vdso/vdso32/sigreturn.S
@@ -9,9 +9,14 @@
#include <linux/linkage.h>
#include <asm/unistd_32.h>
#include <asm/asm-offsets.h>
+#include <asm/alternative-asm.h>
+#include <asm/cpufeature.h>

-#ifndef SYSCALL_ENTER_KERNEL
-#define SYSCALL_ENTER_KERNEL int $0x80
+#ifdef CONFIG_COMPAT
+#define SYSCALL_ENTER_KERNEL \
+ ALTERNATIVE "int $0x80", "syscall", X86_FEATURE_SYSCALL32
+#else
+#define SYSCALL_ENTER_KERNEL int $0x80
#endif

.text
diff --git a/arch/x86/entry/vdso/vdso32/syscall.S b/arch/x86/entry/vdso/vdso32/syscall.S
index 6b286bb..75545ec 100644
--- a/arch/x86/entry/vdso/vdso32/syscall.S
+++ b/arch/x86/entry/vdso/vdso32/syscall.S
@@ -1,12 +1,7 @@
/*
* Code for the vDSO. This version uses the syscall instruction.
- *
- * First get the common code for the sigreturn entry points.
- * This must come first.
*/
-#define SYSCALL_ENTER_KERNEL syscall
-#include "sigreturn.S"
-
+#include <linux/linkage.h>
#include <asm/segment.h>

.text
diff --git a/arch/x86/entry/vdso/vdso32/sysenter.S b/arch/x86/entry/vdso/vdso32/sysenter.S
index e354bce..e99c7699 100644
--- a/arch/x86/entry/vdso/vdso32/sysenter.S
+++ b/arch/x86/entry/vdso/vdso32/sysenter.S
@@ -1,10 +1,7 @@
/*
* Code for the vDSO. This version uses the sysenter instruction.
- *
- * First get the common code for the sigreturn entry points.
- * This must come first.
*/
-#include "sigreturn.S"
+#include <linux/linkage.h>

/*
* The caller puts arg2 in %ecx, which gets pushed. The kernel will use
--
2.4.3

2015-08-29 15:20:44

by Brian Gerst

[permalink] [raw]
Subject: [PATCH 2/7] x86/vdso32: Remove VDSO32_vsyscall_eh_frame_size

This symbol and the padding are unnecessary since we no longer rely
on the symbols being exactly the same in each variant of the vdso32.

Signed-off-by: Brian Gerst <[email protected]>
---
arch/x86/entry/vdso/vdso32/int80.S | 8 --------
arch/x86/entry/vdso/vdso32/syscall.S | 8 --------
arch/x86/entry/vdso/vdso32/sysenter.S | 6 ------
3 files changed, 22 deletions(-)

diff --git a/arch/x86/entry/vdso/vdso32/int80.S b/arch/x86/entry/vdso/vdso32/int80.S
index e40af1c..667b25e 100644
--- a/arch/x86/entry/vdso/vdso32/int80.S
+++ b/arch/x86/entry/vdso/vdso32/int80.S
@@ -43,11 +43,3 @@ __kernel_vsyscall:
.align 4
.LENDFDEDLSI:
.previous
-
- /*
- * Pad out the segment to match the size of the sysenter.S version.
- */
-VDSO32_vsyscall_eh_frame_size = 0x40
- .section .data,"aw",@progbits
- .space VDSO32_vsyscall_eh_frame_size-(.LENDFDEDLSI-.LSTARTFRAMEDLSI), 0
- .previous
diff --git a/arch/x86/entry/vdso/vdso32/syscall.S b/arch/x86/entry/vdso/vdso32/syscall.S
index 75545ec..73f1428 100644
--- a/arch/x86/entry/vdso/vdso32/syscall.S
+++ b/arch/x86/entry/vdso/vdso32/syscall.S
@@ -60,11 +60,3 @@ __kernel_vsyscall:
.align 4
.LENDFDE1:
.previous
-
- /*
- * Pad out the segment to match the size of the sysenter.S version.
- */
-VDSO32_vsyscall_eh_frame_size = 0x40
- .section .data,"aw",@progbits
- .space VDSO32_vsyscall_eh_frame_size-(.LENDFDE1-.LSTARTFRAME), 0
- .previous
diff --git a/arch/x86/entry/vdso/vdso32/sysenter.S b/arch/x86/entry/vdso/vdso32/sysenter.S
index e99c7699..e8e3080 100644
--- a/arch/x86/entry/vdso/vdso32/sysenter.S
+++ b/arch/x86/entry/vdso/vdso32/sysenter.S
@@ -105,9 +105,3 @@ VDSO32_SYSENTER_RETURN: /* Symbol used by sysenter.c via vdso32-syms.h */
.align 4
.LENDFDEDLSI:
.previous
-
- /*
- * Emit a symbol with the size of this .eh_frame data,
- * to verify it matches the other versions.
- */
-VDSO32_vsyscall_eh_frame_size = (.LENDFDEDLSI-.LSTARTFRAMEDLSI)
--
2.4.3

2015-08-29 15:20:42

by Brian Gerst

[permalink] [raw]
Subject: [PATCH 3/7] x86/vdso32: Remove unused vdso-fakesections.c

Signed-off-by: Brian Gerst <[email protected]>
---
arch/x86/entry/vdso/vdso32/vdso-fakesections.c | 1 -
1 file changed, 1 deletion(-)
delete mode 100644 arch/x86/entry/vdso/vdso32/vdso-fakesections.c

diff --git a/arch/x86/entry/vdso/vdso32/vdso-fakesections.c b/arch/x86/entry/vdso/vdso32/vdso-fakesections.c
deleted file mode 100644
index 541468e..0000000
--- a/arch/x86/entry/vdso/vdso32/vdso-fakesections.c
+++ /dev/null
@@ -1 +0,0 @@
-#include "../vdso-fakesections.c"
--
2.4.3

2015-08-29 15:21:27

by Brian Gerst

[permalink] [raw]
Subject: [PATCH 4/7] x86/vdso32: Build single vdso32 image

Currently, there are three images that are built for vdso32, differing
only in the syscall entry code. The syscall entry is a tiny fraction of
the total code, so most of the vdso code is duplicated in memory three
times. This patch merges all the syscall entry points into one image,
and instead of selecting the image, selects the entry point that is placed
in the AT_SYSINFO vector and the ELF entry point.

Signed-off-by: Brian Gerst <[email protected]>
---
arch/x86/entry/vdso/.gitignore | 3 ---
arch/x86/entry/vdso/Makefile | 44 ++++++++++++-----------------------
arch/x86/entry/vdso/vdso2c.c | 2 ++
arch/x86/entry/vdso/vdso32-setup.c | 15 ++++++------
arch/x86/entry/vdso/vdso32/syscall.S | 8 +++----
arch/x86/entry/vdso/vdso32/sysenter.S | 8 +++----
arch/x86/entry/vdso/vma.c | 6 ++---
arch/x86/ia32/ia32_signal.c | 4 ++--
arch/x86/include/asm/elf.h | 3 +--
arch/x86/include/asm/vdso.h | 11 ++++-----
arch/x86/kernel/signal.c | 4 ++--
arch/x86/xen/setup.c | 12 ++--------
12 files changed, 47 insertions(+), 73 deletions(-)

diff --git a/arch/x86/entry/vdso/.gitignore b/arch/x86/entry/vdso/.gitignore
index aae8ffd..a6a6ca8 100644
--- a/arch/x86/entry/vdso/.gitignore
+++ b/arch/x86/entry/vdso/.gitignore
@@ -1,7 +1,4 @@
vdso.lds
vdsox32.lds
-vdso32-syscall-syms.lds
-vdso32-sysenter-syms.lds
-vdso32-int80-syms.lds
vdso-image-*.c
vdso2c
diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index b4cd431..282121a 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -19,9 +19,7 @@ obj-y += vma.o
# vDSO images to build
vdso_img-$(VDSO64-y) += 64
vdso_img-$(VDSOX32-y) += x32
-vdso_img-$(VDSO32-y) += 32-int80
-vdso_img-$(CONFIG_IA32_EMULATION) += 32-syscall
-vdso_img-$(VDSO32-y) += 32-sysenter
+vdso_img-$(VDSO32-y) += 32

obj-$(VDSO32-y) += vdso32-setup.o

@@ -122,15 +120,6 @@ $(obj)/%.so: $(obj)/%.so.dbg
$(obj)/vdsox32.so.dbg: $(src)/vdsox32.lds $(vobjx32s) FORCE
$(call if_changed,vdso)

-#
-# Build multiple 32-bit vDSO images to choose from at boot time.
-#
-vdso32.so-$(VDSO32-y) += int80
-vdso32.so-$(CONFIG_IA32_EMULATION) += syscall
-vdso32.so-$(VDSO32-y) += sysenter
-
-vdso32-images = $(vdso32.so-y:%=vdso32-%.so)
-
CPPFLAGS_vdso32.lds = $(CPPFLAGS_vdso.lds)
VDSO_LDFLAGS_vdso32.lds = -m32 -Wl,-m,elf_i386 -Wl,-soname=linux-gate.so.1

@@ -138,15 +127,9 @@ VDSO_LDFLAGS_vdso32.lds = -m32 -Wl,-m,elf_i386 -Wl,-soname=linux-gate.so.1
# is not a kbuild sub-make subdirectory.
override obj-dirs = $(dir $(obj)) $(obj)/vdso32/

-targets += vdso32/vdso32.lds
-targets += vdso32/note.o vdso32/vclock_gettime.o $(vdso32.so-y:%=vdso32/%.o)
-targets += vdso32/vclock_gettime.o vdso32/sigreturn.o
-
-$(obj)/vdso32.o: $(vdso32-images:%=$(obj)/%)
-
KBUILD_AFLAGS_32 := $(filter-out -m64,$(KBUILD_AFLAGS))
-$(vdso32-images:%=$(obj)/%.dbg): KBUILD_AFLAGS = $(KBUILD_AFLAGS_32)
-$(vdso32-images:%=$(obj)/%.dbg): asflags-$(CONFIG_X86_64) += -m32
+$(obj)/vdso32.so.dbg: KBUILD_AFLAGS = $(KBUILD_AFLAGS_32)
+$(obj)/vdso32.so.dbg: asflags-$(CONFIG_X86_64) += -m32

KBUILD_CFLAGS_32 := $(filter-out -m64,$(KBUILD_CFLAGS))
KBUILD_CFLAGS_32 := $(filter-out -mcmodel=kernel,$(KBUILD_CFLAGS_32))
@@ -157,14 +140,17 @@ KBUILD_CFLAGS_32 += $(call cc-option, -fno-stack-protector)
KBUILD_CFLAGS_32 += $(call cc-option, -foptimize-sibling-calls)
KBUILD_CFLAGS_32 += -fno-omit-frame-pointer
KBUILD_CFLAGS_32 += -DDISABLE_BRANCH_PROFILING
-$(vdso32-images:%=$(obj)/%.dbg): KBUILD_CFLAGS = $(KBUILD_CFLAGS_32)
-
-$(vdso32-images:%=$(obj)/%.dbg): $(obj)/vdso32-%.so.dbg: FORCE \
- $(obj)/vdso32/vdso32.lds \
- $(obj)/vdso32/vclock_gettime.o \
- $(obj)/vdso32/note.o \
- $(obj)/vdso32/sigreturn.o \
- $(obj)/vdso32/%.o
+
+vobjs32-y := vdso32/vclock_gettime.o vdso32/note.o vdso32/sigreturn.o
+vobjs32-y += vdso32/int80.o vdso32/sysenter.o
+vobjs32-$(CONFIG_COMPAT) += vdso32/syscall.o
+
+vobjs32 := $(foreach F,$(vobjs32-y),$(obj)/$F)
+
+targets += vdso32/vdso32.lds $(vobjs32-y)
+
+$(obj)/vdso32.so.dbg: KBUILD_CFLAGS = $(KBUILD_CFLAGS_32)
+$(obj)/vdso32.so.dbg: $(obj)/vdso32/vdso32.lds $(vobjs32) FORCE
$(call if_changed,vdso)

#
@@ -207,4 +193,4 @@ $(vdso_img_insttargets): install_%: $(obj)/%.dbg $(MODLIB)/vdso FORCE
PHONY += vdso_install $(vdso_img_insttargets)
vdso_install: $(vdso_img_insttargets) FORCE

-clean-files := vdso32-syscall* vdso32-sysenter* vdso32-int80* vdso64* vdso-image-*.c vdsox32.so*
+clean-files := vdso32* vdso64* vdso-image-*.c vdsox32.so*
diff --git a/arch/x86/entry/vdso/vdso2c.c b/arch/x86/entry/vdso/vdso2c.c
index 8627db2..cda5fa8 100644
--- a/arch/x86/entry/vdso/vdso2c.c
+++ b/arch/x86/entry/vdso/vdso2c.c
@@ -100,6 +100,8 @@ struct vdso_sym required_syms[] = {
{"VDSO32_NOTE_MASK", true},
{"VDSO32_SYSENTER_RETURN", true},
{"__kernel_vsyscall", true},
+ {"__kernel_vsyscall_syscall", true},
+ {"__kernel_vsyscall_sysenter", true},
{"__kernel_sigreturn", true},
{"__kernel_rt_sigreturn", true},
};
diff --git a/arch/x86/entry/vdso/vdso32-setup.c b/arch/x86/entry/vdso/vdso32-setup.c
index e904c27..d644762 100644
--- a/arch/x86/entry/vdso/vdso32-setup.c
+++ b/arch/x86/entry/vdso/vdso32-setup.c
@@ -10,6 +10,7 @@
#include <linux/smp.h>
#include <linux/kernel.h>
#include <linux/mm_types.h>
+#include <linux/elf.h>

#include <asm/cpufeature.h>
#include <asm/processor.h>
@@ -60,23 +61,23 @@ __setup_param("vdso=", vdso_setup, vdso32_setup, 0);

#endif /* CONFIG_X86_64 */

-#if defined(CONFIG_X86_32) || defined(CONFIG_COMPAT)
-const struct vdso_image *selected_vdso32;
-#endif
+unsigned long selected_vsyscall;

int __init sysenter_setup(void)
{
#ifdef CONFIG_COMPAT
if (vdso32_syscall())
- selected_vdso32 = &vdso_image_32_syscall;
+ selected_vsyscall = vdso_image_32.sym___kernel_vsyscall_syscall;
else
#endif
if (vdso32_sysenter())
- selected_vdso32 = &vdso_image_32_sysenter;
+ selected_vsyscall = vdso_image_32.sym___kernel_vsyscall_sysenter;
else
- selected_vdso32 = &vdso_image_32_int80;
+ selected_vsyscall = vdso_image_32.sym___kernel_vsyscall;
+
+ ((struct elf32_hdr *)vdso_image_32.data)->e_entry = selected_vsyscall;

- init_vdso_image(selected_vdso32);
+ init_vdso_image(&vdso_image_32);

return 0;
}
diff --git a/arch/x86/entry/vdso/vdso32/syscall.S b/arch/x86/entry/vdso/vdso32/syscall.S
index 73f1428..50490c8 100644
--- a/arch/x86/entry/vdso/vdso32/syscall.S
+++ b/arch/x86/entry/vdso/vdso32/syscall.S
@@ -5,10 +5,10 @@
#include <asm/segment.h>

.text
- .globl __kernel_vsyscall
- .type __kernel_vsyscall,@function
+ .globl __kernel_vsyscall_syscall
+ .type __kernel_vsyscall_syscall,@function
ALIGN
-__kernel_vsyscall:
+__kernel_vsyscall_syscall:
.LSTART_vsyscall:
push %ebp
.Lpush_ebp:
@@ -19,7 +19,7 @@ __kernel_vsyscall:
.Lpop_ebp:
ret
.LEND_vsyscall:
- .size __kernel_vsyscall,.-.LSTART_vsyscall
+ .size __kernel_vsyscall_syscall,.-.LSTART_vsyscall

.section .eh_frame,"a",@progbits
.LSTARTFRAME:
diff --git a/arch/x86/entry/vdso/vdso32/sysenter.S b/arch/x86/entry/vdso/vdso32/sysenter.S
index e8e3080..458954a 100644
--- a/arch/x86/entry/vdso/vdso32/sysenter.S
+++ b/arch/x86/entry/vdso/vdso32/sysenter.S
@@ -22,10 +22,10 @@
* three words on the parent stack do not get copied to the child.
*/
.text
- .globl __kernel_vsyscall
- .type __kernel_vsyscall,@function
+ .globl __kernel_vsyscall_sysenter
+ .type __kernel_vsyscall_sysenter,@function
ALIGN
-__kernel_vsyscall:
+__kernel_vsyscall_sysenter:
.LSTART_vsyscall:
push %ecx
.Lpush_ecx:
@@ -51,7 +51,7 @@ VDSO32_SYSENTER_RETURN: /* Symbol used by sysenter.c via vdso32-syms.h */
.Lpop_ecx:
ret
.LEND_vsyscall:
- .size __kernel_vsyscall,.-.LSTART_vsyscall
+ .size __kernel_vsyscall_sysenter,.-.LSTART_vsyscall
.previous

.section .eh_frame,"a",@progbits
diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 4345431..c726d49 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -185,14 +185,14 @@ static int load_vdso32(void)
if (vdso32_enabled != 1) /* Other values all mean "disabled" */
return 0;

- ret = map_vdso(selected_vdso32, false);
+ ret = map_vdso(&vdso_image_32, false);
if (ret)
return ret;

- if (selected_vdso32->sym_VDSO32_SYSENTER_RETURN)
+ if (vdso_image_32.sym_VDSO32_SYSENTER_RETURN)
current_thread_info()->sysenter_return =
current->mm->context.vdso +
- selected_vdso32->sym_VDSO32_SYSENTER_RETURN;
+ vdso_image_32.sym_VDSO32_SYSENTER_RETURN;

return 0;
}
diff --git a/arch/x86/ia32/ia32_signal.c b/arch/x86/ia32/ia32_signal.c
index a0a19b7..e6a5c275 100644
--- a/arch/x86/ia32/ia32_signal.c
+++ b/arch/x86/ia32/ia32_signal.c
@@ -289,7 +289,7 @@ int ia32_setup_frame(int sig, struct ksignal *ksig,
/* Return stub is in 32bit vsyscall page */
if (current->mm->context.vdso)
restorer = current->mm->context.vdso +
- selected_vdso32->sym___kernel_sigreturn;
+ vdso_image_32.sym___kernel_sigreturn;
else
restorer = &frame->retcode;
}
@@ -368,7 +368,7 @@ int ia32_setup_rt_frame(int sig, struct ksignal *ksig,
restorer = ksig->ka.sa.sa_restorer;
else
restorer = current->mm->context.vdso +
- selected_vdso32->sym___kernel_rt_sigreturn;
+ vdso_image_32.sym___kernel_rt_sigreturn;
put_user_ex(ptr_to_compat(restorer), &frame->pretcode);

/*
diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index 141c561..ccc1d31 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -327,8 +327,7 @@ else \
#define VDSO_CURRENT_BASE ((unsigned long)current->mm->context.vdso)

#define VDSO_ENTRY \
- ((unsigned long)current->mm->context.vdso + \
- selected_vdso32->sym___kernel_vsyscall)
+ ((unsigned long)current->mm->context.vdso + selected_vsyscall)

struct linux_binprm;

diff --git a/arch/x86/include/asm/vdso.h b/arch/x86/include/asm/vdso.h
index 8021bd2..16d5c18 100644
--- a/arch/x86/include/asm/vdso.h
+++ b/arch/x86/include/asm/vdso.h
@@ -26,6 +26,8 @@ struct vdso_image {
long sym___kernel_sigreturn;
long sym___kernel_rt_sigreturn;
long sym___kernel_vsyscall;
+ long sym___kernel_vsyscall_syscall;
+ long sym___kernel_vsyscall_sysenter;
long sym_VDSO32_SYSENTER_RETURN;
};

@@ -38,13 +40,8 @@ extern const struct vdso_image vdso_image_x32;
#endif

#if defined CONFIG_X86_32 || defined CONFIG_COMPAT
-extern const struct vdso_image vdso_image_32_int80;
-#ifdef CONFIG_COMPAT
-extern const struct vdso_image vdso_image_32_syscall;
-#endif
-extern const struct vdso_image vdso_image_32_sysenter;
-
-extern const struct vdso_image *selected_vdso32;
+extern const struct vdso_image vdso_image_32;
+extern unsigned long selected_vsyscall;
#endif

extern void __init init_vdso_image(const struct vdso_image *image);
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index da52e6b..d87ce92 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -299,7 +299,7 @@ __setup_frame(int sig, struct ksignal *ksig, sigset_t *set,

if (current->mm->context.vdso)
restorer = current->mm->context.vdso +
- selected_vdso32->sym___kernel_sigreturn;
+ vdso_image_32.sym___kernel_sigreturn;
else
restorer = &frame->retcode;
if (ksig->ka.sa.sa_flags & SA_RESTORER)
@@ -363,7 +363,7 @@ static int __setup_rt_frame(int sig, struct ksignal *ksig,

/* Set up to return from userspace. */
restorer = current->mm->context.vdso +
- selected_vdso32->sym___kernel_rt_sigreturn;
+ vdso_image_32.sym___kernel_rt_sigreturn;
if (ksig->ka.sa.sa_flags & SA_RESTORER)
restorer = ksig->ka.sa.sa_restorer;
put_user_ex(restorer, &frame->pretcode);
diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 55f388e..b166ffd 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -753,17 +753,9 @@ char * __init xen_auto_xlated_memory_setup(void)
static void __init fiddle_vdso(void)
{
#ifdef CONFIG_X86_32
- /*
- * This could be called before selected_vdso32 is initialized, so
- * just fiddle with both possible images. vdso_image_32_syscall
- * can't be selected, since it only exists on 64-bit systems.
- */
u32 *mask;
- mask = vdso_image_32_int80.data +
- vdso_image_32_int80.sym_VDSO32_NOTE_MASK;
- *mask |= 1 << VDSO_NOTE_NONEGSEG_BIT;
- mask = vdso_image_32_sysenter.data +
- vdso_image_32_sysenter.sym_VDSO32_NOTE_MASK;
+ mask = vdso_image_32.data +
+ vdso_image_32.sym_VDSO32_NOTE_MASK;
*mask |= 1 << VDSO_NOTE_NONEGSEG_BIT;
#endif
}
--
2.4.3

2015-08-29 15:21:23

by Brian Gerst

[permalink] [raw]
Subject: [PATCH 5/7] x86/vdso: Merge 32-bit and 64-bit source files

Merge the 32-bit versions of vclock_gettime and note.S into the 64-bit code.
Add some make rules to handle the combined code.

Signed-off-by: Brian Gerst <[email protected]>
---
arch/x86/entry/vdso/Makefile | 10 ++++++-
arch/x86/entry/vdso/vclock_gettime.c | 31 ++++++++++++++++++++
arch/x86/entry/vdso/vdso-note.S | 34 +++++++++++++++++++++-
arch/x86/entry/vdso/vdso32/note.S | 44 -----------------------------
arch/x86/entry/vdso/vdso32/vclock_gettime.c | 30 --------------------
5 files changed, 73 insertions(+), 76 deletions(-)
delete mode 100644 arch/x86/entry/vdso/vdso32/note.S
delete mode 100644 arch/x86/entry/vdso/vdso32/vclock_gettime.c

diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 282121a..a8aa0c0 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -128,6 +128,7 @@ VDSO_LDFLAGS_vdso32.lds = -m32 -Wl,-m,elf_i386 -Wl,-soname=linux-gate.so.1
override obj-dirs = $(dir $(obj)) $(obj)/vdso32/

KBUILD_AFLAGS_32 := $(filter-out -m64,$(KBUILD_AFLAGS))
+KBUILD_AFLAGS_32 += -DBUILD_VDSO32
$(obj)/vdso32.so.dbg: KBUILD_AFLAGS = $(KBUILD_AFLAGS_32)
$(obj)/vdso32.so.dbg: asflags-$(CONFIG_X86_64) += -m32

@@ -140,8 +141,9 @@ KBUILD_CFLAGS_32 += $(call cc-option, -fno-stack-protector)
KBUILD_CFLAGS_32 += $(call cc-option, -foptimize-sibling-calls)
KBUILD_CFLAGS_32 += -fno-omit-frame-pointer
KBUILD_CFLAGS_32 += -DDISABLE_BRANCH_PROFILING
+KBUILD_CFLAGS_32 += -DBUILD_VDSO32

-vobjs32-y := vdso32/vclock_gettime.o vdso32/note.o vdso32/sigreturn.o
+vobjs32-y := vclock_gettime-32.o vdso-note-32.o vdso32/sigreturn.o
vobjs32-y += vdso32/int80.o vdso32/sysenter.o
vobjs32-$(CONFIG_COMPAT) += vdso32/syscall.o

@@ -149,6 +151,12 @@ vobjs32 := $(foreach F,$(vobjs32-y),$(obj)/$F)

targets += vdso32/vdso32.lds $(vobjs32-y)

+$(obj)/%-32.o: $(src)/%.c FORCE
+ $(call if_changed_dep,cc_o_c)
+
+$(obj)/%-32.o: $(src)/%.S FORCE
+ $(call if_changed_dep,as_o_S)
+
$(obj)/vdso32.so.dbg: KBUILD_CFLAGS = $(KBUILD_CFLAGS_32)
$(obj)/vdso32.so.dbg: $(obj)/vdso32/vdso32.lds $(vobjs32) FORCE
$(call if_changed,vdso)
diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c
index ca94fa6..0d1faee 100644
--- a/arch/x86/entry/vdso/vclock_gettime.c
+++ b/arch/x86/entry/vdso/vclock_gettime.c
@@ -11,6 +11,37 @@
* Check with readelf after changing.
*/

+#ifdef BUILD_VDSO32
+
+#ifndef CONFIG_CC_OPTIMIZE_FOR_SIZE
+#undef CONFIG_OPTIMIZE_INLINING
+#endif
+
+#undef CONFIG_X86_PPRO_FENCE
+
+#ifdef CONFIG_X86_64
+
+/*
+ * in case of a 32 bit VDSO for a 64 bit kernel fake a 32 bit kernel
+ * configuration
+ */
+#undef CONFIG_64BIT
+#undef CONFIG_X86_64
+#undef CONFIG_ILLEGAL_POINTER_VALUE
+#undef CONFIG_SPARSEMEM_VMEMMAP
+#undef CONFIG_NR_CPUS
+
+#define CONFIG_X86_32 1
+#define CONFIG_PAGE_OFFSET 0
+#define CONFIG_ILLEGAL_POINTER_VALUE 0
+#define CONFIG_NR_CPUS 1
+
+#define BUILD_VDSO32_64
+
+#endif /* CONFIG_X86_64 */
+
+#endif /* BUILD_VDSO32 */
+
#include <uapi/linux/time.h>
#include <asm/vgtod.h>
#include <asm/hpet.h>
diff --git a/arch/x86/entry/vdso/vdso-note.S b/arch/x86/entry/vdso/vdso-note.S
index 79a071e..eb8a6c7 100644
--- a/arch/x86/entry/vdso/vdso-note.S
+++ b/arch/x86/entry/vdso/vdso-note.S
@@ -3,10 +3,42 @@
* Here we can supply some information useful to userland.
*/

-#include <linux/uts.h>
#include <linux/version.h>
#include <linux/elfnote.h>

+/* Ideally this would use UTS_NAME, but using a quoted string here
+ doesn't work. Remember to change this when changing the
+ kernel's name. */
ELFNOTE_START(Linux, 0, "a")
.long LINUX_VERSION_CODE
ELFNOTE_END
+
+#if defined(CONFIG_XEN) && defined(BUILD_VDSO32)
+/*
+ * Add a special note telling glibc's dynamic linker a fake hardware
+ * flavor that it will use to choose the search path for libraries in the
+ * same way it uses real hardware capabilities like "mmx".
+ * We supply "nosegneg" as the fake capability, to indicate that we
+ * do not like negative offsets in instructions using segment overrides,
+ * since we implement those inefficiently. This makes it possible to
+ * install libraries optimized to avoid those access patterns in someplace
+ * like /lib/i686/tls/nosegneg. Note that an /etc/ld.so.conf.d/file
+ * corresponding to the bits here is needed to make ldconfig work right.
+ * It should contain:
+ * hwcap 1 nosegneg
+ * to match the mapping of bit to name that we give here.
+ *
+ * At runtime, the fake hardware feature will be considered to be present
+ * if its bit is set in the mask word. So, we start with the mask 0, and
+ * at boot time we set VDSO_NOTE_NONEGSEG_BIT if running under Xen.
+ */
+
+#include "../../xen/vdso.h" /* Defines VDSO_NOTE_NONEGSEG_BIT. */
+
+ELFNOTE_START(GNU, 2, "a")
+ .long 1 /* ncaps */
+VDSO32_NOTE_MASK: /* Symbol used by arch/x86/xen/setup.c */
+ .long 0 /* mask */
+ .byte VDSO_NOTE_NONEGSEG_BIT; .asciz "nosegneg" /* bit, name */
+ELFNOTE_END
+#endif
diff --git a/arch/x86/entry/vdso/vdso32/note.S b/arch/x86/entry/vdso/vdso32/note.S
deleted file mode 100644
index c83f257..0000000
--- a/arch/x86/entry/vdso/vdso32/note.S
+++ /dev/null
@@ -1,44 +0,0 @@
-/*
- * This supplies .note.* sections to go into the PT_NOTE inside the vDSO text.
- * Here we can supply some information useful to userland.
- */
-
-#include <linux/version.h>
-#include <linux/elfnote.h>
-
-/* Ideally this would use UTS_NAME, but using a quoted string here
- doesn't work. Remember to change this when changing the
- kernel's name. */
-ELFNOTE_START(Linux, 0, "a")
- .long LINUX_VERSION_CODE
-ELFNOTE_END
-
-#ifdef CONFIG_XEN
-/*
- * Add a special note telling glibc's dynamic linker a fake hardware
- * flavor that it will use to choose the search path for libraries in the
- * same way it uses real hardware capabilities like "mmx".
- * We supply "nosegneg" as the fake capability, to indicate that we
- * do not like negative offsets in instructions using segment overrides,
- * since we implement those inefficiently. This makes it possible to
- * install libraries optimized to avoid those access patterns in someplace
- * like /lib/i686/tls/nosegneg. Note that an /etc/ld.so.conf.d/file
- * corresponding to the bits here is needed to make ldconfig work right.
- * It should contain:
- * hwcap 1 nosegneg
- * to match the mapping of bit to name that we give here.
- *
- * At runtime, the fake hardware feature will be considered to be present
- * if its bit is set in the mask word. So, we start with the mask 0, and
- * at boot time we set VDSO_NOTE_NONEGSEG_BIT if running under Xen.
- */
-
-#include "../../xen/vdso.h" /* Defines VDSO_NOTE_NONEGSEG_BIT. */
-
-ELFNOTE_START(GNU, 2, "a")
- .long 1 /* ncaps */
-VDSO32_NOTE_MASK: /* Symbol used by arch/x86/xen/setup.c */
- .long 0 /* mask */
- .byte VDSO_NOTE_NONEGSEG_BIT; .asciz "nosegneg" /* bit, name */
-ELFNOTE_END
-#endif
diff --git a/arch/x86/entry/vdso/vdso32/vclock_gettime.c b/arch/x86/entry/vdso/vdso32/vclock_gettime.c
deleted file mode 100644
index 175cc72..0000000
--- a/arch/x86/entry/vdso/vdso32/vclock_gettime.c
+++ /dev/null
@@ -1,30 +0,0 @@
-#define BUILD_VDSO32
-
-#ifndef CONFIG_CC_OPTIMIZE_FOR_SIZE
-#undef CONFIG_OPTIMIZE_INLINING
-#endif
-
-#undef CONFIG_X86_PPRO_FENCE
-
-#ifdef CONFIG_X86_64
-
-/*
- * in case of a 32 bit VDSO for a 64 bit kernel fake a 32 bit kernel
- * configuration
- */
-#undef CONFIG_64BIT
-#undef CONFIG_X86_64
-#undef CONFIG_ILLEGAL_POINTER_VALUE
-#undef CONFIG_SPARSEMEM_VMEMMAP
-#undef CONFIG_NR_CPUS
-
-#define CONFIG_X86_32 1
-#define CONFIG_PAGE_OFFSET 0
-#define CONFIG_ILLEGAL_POINTER_VALUE 0
-#define CONFIG_NR_CPUS 1
-
-#define BUILD_VDSO32_64
-
-#endif
-
-#include "../vclock_gettime.c"
--
2.4.3

2015-08-29 15:21:22

by Brian Gerst

[permalink] [raw]
Subject: [PATCH 6/7] x86/vdso32/xen: Move VDSO_NOTE_NONEGSEG_BIT define

Xen had its own vdso.h just to define VDSO_NOTE_NONEGSEG_BIT. Move it to the
main vdso.h.

Signed-off-by: Brian Gerst <[email protected]>
---
arch/x86/entry/vdso/vdso-note.S | 4 +---
arch/x86/include/asm/vdso.h | 9 +++++++++
arch/x86/xen/setup.c | 1 -
arch/x86/xen/vdso.h | 4 ----
4 files changed, 10 insertions(+), 8 deletions(-)
delete mode 100644 arch/x86/xen/vdso.h

diff --git a/arch/x86/entry/vdso/vdso-note.S b/arch/x86/entry/vdso/vdso-note.S
index eb8a6c7..34aa574 100644
--- a/arch/x86/entry/vdso/vdso-note.S
+++ b/arch/x86/entry/vdso/vdso-note.S
@@ -5,6 +5,7 @@

#include <linux/version.h>
#include <linux/elfnote.h>
+#include <asm/vdso.h>

/* Ideally this would use UTS_NAME, but using a quoted string here
doesn't work. Remember to change this when changing the
@@ -32,9 +33,6 @@ ELFNOTE_END
* if its bit is set in the mask word. So, we start with the mask 0, and
* at boot time we set VDSO_NOTE_NONEGSEG_BIT if running under Xen.
*/
-
-#include "../../xen/vdso.h" /* Defines VDSO_NOTE_NONEGSEG_BIT. */
-
ELFNOTE_START(GNU, 2, "a")
.long 1 /* ncaps */
VDSO32_NOTE_MASK: /* Symbol used by arch/x86/xen/setup.c */
diff --git a/arch/x86/include/asm/vdso.h b/arch/x86/include/asm/vdso.h
index 16d5c18..8d9a961 100644
--- a/arch/x86/include/asm/vdso.h
+++ b/arch/x86/include/asm/vdso.h
@@ -48,4 +48,13 @@ extern void __init init_vdso_image(const struct vdso_image *image);

#endif /* __ASSEMBLER__ */

+#if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
+/*
+ * Bit used for the pseudo-hwcap for non-negative segments. We use
+ * bit 1 to avoid bugs in some versions of glibc when bit 0 is
+ * used; the choice is otherwise arbitrary.
+ */
+#define VDSO_NOTE_NONEGSEG_BIT 1
+#endif
+
#endif /* _ASM_X86_VDSO_H */
diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index b166ffd..79f9ed7 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -28,7 +28,6 @@
#include <xen/interface/physdev.h>
#include <xen/features.h>
#include "xen-ops.h"
-#include "vdso.h"
#include "p2m.h"
#include "mmu.h"

diff --git a/arch/x86/xen/vdso.h b/arch/x86/xen/vdso.h
deleted file mode 100644
index 861fedf..0000000
--- a/arch/x86/xen/vdso.h
+++ /dev/null
@@ -1,4 +0,0 @@
-/* Bit used for the pseudo-hwcap for non-negative segments. We use
- bit 1 to avoid bugs in some versions of glibc when bit 0 is
- used; the choice is otherwise arbitrary. */
-#define VDSO_NOTE_NONEGSEG_BIT 1
--
2.4.3

2015-08-29 15:20:59

by Brian Gerst

[permalink] [raw]
Subject: [PATCH 7/7] x86/vdso32: Remove vdso32 subdirectory

Since the vdso32 subdirectory doesn't have a proper Makefile, it is more
difficult to work with. Move the remaining files up one level.

Signed-off-by: Brian Gerst <[email protected]>
---
arch/x86/entry/vdso/.gitignore | 1 +
arch/x86/entry/vdso/Makefile | 14 ++-
arch/x86/entry/vdso/int80.S | 45 ++++++++++
arch/x86/entry/vdso/sigreturn.S | 150 ++++++++++++++++++++++++++++++++
arch/x86/entry/vdso/syscall.S | 62 +++++++++++++
arch/x86/entry/vdso/sysenter.S | 107 +++++++++++++++++++++++
arch/x86/entry/vdso/vdso32.lds.S | 37 ++++++++
arch/x86/entry/vdso/vdso32/.gitignore | 1 -
arch/x86/entry/vdso/vdso32/int80.S | 45 ----------
arch/x86/entry/vdso/vdso32/sigreturn.S | 150 --------------------------------
arch/x86/entry/vdso/vdso32/syscall.S | 62 -------------
arch/x86/entry/vdso/vdso32/sysenter.S | 107 -----------------------
arch/x86/entry/vdso/vdso32/vdso32.lds.S | 37 --------
13 files changed, 407 insertions(+), 411 deletions(-)
create mode 100644 arch/x86/entry/vdso/int80.S
create mode 100644 arch/x86/entry/vdso/sigreturn.S
create mode 100644 arch/x86/entry/vdso/syscall.S
create mode 100644 arch/x86/entry/vdso/sysenter.S
create mode 100644 arch/x86/entry/vdso/vdso32.lds.S
delete mode 100644 arch/x86/entry/vdso/vdso32/.gitignore
delete mode 100644 arch/x86/entry/vdso/vdso32/int80.S
delete mode 100644 arch/x86/entry/vdso/vdso32/sigreturn.S
delete mode 100644 arch/x86/entry/vdso/vdso32/syscall.S
delete mode 100644 arch/x86/entry/vdso/vdso32/sysenter.S
delete mode 100644 arch/x86/entry/vdso/vdso32/vdso32.lds.S

diff --git a/arch/x86/entry/vdso/.gitignore b/arch/x86/entry/vdso/.gitignore
index a6a6ca8..d285fe6 100644
--- a/arch/x86/entry/vdso/.gitignore
+++ b/arch/x86/entry/vdso/.gitignore
@@ -1,4 +1,5 @@
vdso.lds
vdsox32.lds
+vdso32.lds
vdso-image-*.c
vdso2c
diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index a8aa0c0..45df51e 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -123,10 +123,6 @@ $(obj)/vdsox32.so.dbg: $(src)/vdsox32.lds $(vobjx32s) FORCE
CPPFLAGS_vdso32.lds = $(CPPFLAGS_vdso.lds)
VDSO_LDFLAGS_vdso32.lds = -m32 -Wl,-m,elf_i386 -Wl,-soname=linux-gate.so.1

-# This makes sure the $(obj) subdirectory exists even though vdso32/
-# is not a kbuild sub-make subdirectory.
-override obj-dirs = $(dir $(obj)) $(obj)/vdso32/
-
KBUILD_AFLAGS_32 := $(filter-out -m64,$(KBUILD_AFLAGS))
KBUILD_AFLAGS_32 += -DBUILD_VDSO32
$(obj)/vdso32.so.dbg: KBUILD_AFLAGS = $(KBUILD_AFLAGS_32)
@@ -143,13 +139,13 @@ KBUILD_CFLAGS_32 += -fno-omit-frame-pointer
KBUILD_CFLAGS_32 += -DDISABLE_BRANCH_PROFILING
KBUILD_CFLAGS_32 += -DBUILD_VDSO32

-vobjs32-y := vclock_gettime-32.o vdso-note-32.o vdso32/sigreturn.o
-vobjs32-y += vdso32/int80.o vdso32/sysenter.o
-vobjs32-$(CONFIG_COMPAT) += vdso32/syscall.o
+vobjs32-y := vclock_gettime-32.o vdso-note-32.o sigreturn.o
+vobjs32-y += int80.o sysenter.o
+vobjs32-$(CONFIG_COMPAT) += syscall.o

vobjs32 := $(foreach F,$(vobjs32-y),$(obj)/$F)

-targets += vdso32/vdso32.lds $(vobjs32-y)
+targets += vdso32.lds $(vobjs32-y)

$(obj)/%-32.o: $(src)/%.c FORCE
$(call if_changed_dep,cc_o_c)
@@ -158,7 +154,7 @@ $(obj)/%-32.o: $(src)/%.S FORCE
$(call if_changed_dep,as_o_S)

$(obj)/vdso32.so.dbg: KBUILD_CFLAGS = $(KBUILD_CFLAGS_32)
-$(obj)/vdso32.so.dbg: $(obj)/vdso32/vdso32.lds $(vobjs32) FORCE
+$(obj)/vdso32.so.dbg: $(obj)/vdso32.lds $(vobjs32) FORCE
$(call if_changed,vdso)

#
diff --git a/arch/x86/entry/vdso/int80.S b/arch/x86/entry/vdso/int80.S
new file mode 100644
index 0000000..667b25e
--- /dev/null
+++ b/arch/x86/entry/vdso/int80.S
@@ -0,0 +1,45 @@
+/*
+ * Code for the vDSO. This version uses the old int $0x80 method.
+ */
+#include <linux/linkage.h>
+
+ .text
+ .globl __kernel_vsyscall
+ .type __kernel_vsyscall,@function
+ ALIGN
+__kernel_vsyscall:
+.LSTART_vsyscall:
+ int $0x80
+ ret
+.LEND_vsyscall:
+ .size __kernel_vsyscall,.-.LSTART_vsyscall
+ .previous
+
+ .section .eh_frame,"a",@progbits
+.LSTARTFRAMEDLSI:
+ .long .LENDCIEDLSI-.LSTARTCIEDLSI
+.LSTARTCIEDLSI:
+ .long 0 /* CIE ID */
+ .byte 1 /* Version number */
+ .string "zR" /* NUL-terminated augmentation string */
+ .uleb128 1 /* Code alignment factor */
+ .sleb128 -4 /* Data alignment factor */
+ .byte 8 /* Return address register column */
+ .uleb128 1 /* Augmentation value length */
+ .byte 0x1b /* DW_EH_PE_pcrel|DW_EH_PE_sdata4. */
+ .byte 0x0c /* DW_CFA_def_cfa */
+ .uleb128 4
+ .uleb128 4
+ .byte 0x88 /* DW_CFA_offset, column 0x8 */
+ .uleb128 1
+ .align 4
+.LENDCIEDLSI:
+ .long .LENDFDEDLSI-.LSTARTFDEDLSI /* Length FDE */
+.LSTARTFDEDLSI:
+ .long .LSTARTFDEDLSI-.LSTARTFRAMEDLSI /* CIE pointer */
+ .long .LSTART_vsyscall-. /* PC-relative start address */
+ .long .LEND_vsyscall-.LSTART_vsyscall
+ .uleb128 0
+ .align 4
+.LENDFDEDLSI:
+ .previous
diff --git a/arch/x86/entry/vdso/sigreturn.S b/arch/x86/entry/vdso/sigreturn.S
new file mode 100644
index 0000000..ca0e6ca
--- /dev/null
+++ b/arch/x86/entry/vdso/sigreturn.S
@@ -0,0 +1,150 @@
+/*
+ * Common code for the sigreturn entry points in vDSO images.
+ * So far this code is the same for both int80 and sysenter versions.
+ * This file is #include'd by int80.S et al to define them first thing.
+ * The kernel assumes that the addresses of these routines are constant
+ * for all vDSO implementations.
+ */
+
+#include <linux/linkage.h>
+#include <asm/unistd_32.h>
+#include <asm/asm-offsets.h>
+#include <asm/alternative-asm.h>
+#include <asm/cpufeature.h>
+
+#ifdef CONFIG_COMPAT
+#define SYSCALL_ENTER_KERNEL \
+ ALTERNATIVE "int $0x80", "syscall", X86_FEATURE_SYSCALL32
+#else
+#define SYSCALL_ENTER_KERNEL int $0x80
+#endif
+
+ .text
+ .globl __kernel_sigreturn
+ .type __kernel_sigreturn,@function
+ nop /* this guy is needed for .LSTARTFDEDLSI1 below (watch for HACK) */
+ ALIGN
+__kernel_sigreturn:
+.LSTART_sigreturn:
+ popl %eax /* XXX does this mean it needs unwind info? */
+ movl $__NR_sigreturn, %eax
+ SYSCALL_ENTER_KERNEL
+.LEND_sigreturn:
+ nop
+ .size __kernel_sigreturn,.-.LSTART_sigreturn
+
+ .globl __kernel_rt_sigreturn
+ .type __kernel_rt_sigreturn,@function
+ ALIGN
+__kernel_rt_sigreturn:
+.LSTART_rt_sigreturn:
+ movl $__NR_rt_sigreturn, %eax
+ SYSCALL_ENTER_KERNEL
+.LEND_rt_sigreturn:
+ nop
+ .size __kernel_rt_sigreturn,.-.LSTART_rt_sigreturn
+ .previous
+
+ .section .eh_frame,"a",@progbits
+.LSTARTFRAMEDLSI1:
+ .long .LENDCIEDLSI1-.LSTARTCIEDLSI1
+.LSTARTCIEDLSI1:
+ .long 0 /* CIE ID */
+ .byte 1 /* Version number */
+ .string "zRS" /* NUL-terminated augmentation string */
+ .uleb128 1 /* Code alignment factor */
+ .sleb128 -4 /* Data alignment factor */
+ .byte 8 /* Return address register column */
+ .uleb128 1 /* Augmentation value length */
+ .byte 0x1b /* DW_EH_PE_pcrel|DW_EH_PE_sdata4. */
+ .byte 0 /* DW_CFA_nop */
+ .align 4
+.LENDCIEDLSI1:
+ .long .LENDFDEDLSI1-.LSTARTFDEDLSI1 /* Length FDE */
+.LSTARTFDEDLSI1:
+ .long .LSTARTFDEDLSI1-.LSTARTFRAMEDLSI1 /* CIE pointer */
+ /* HACK: The dwarf2 unwind routines will subtract 1 from the
+ return address to get an address in the middle of the
+ presumed call instruction. Since we didn't get here via
+ a call, we need to include the nop before the real start
+ to make up for it. */
+ .long .LSTART_sigreturn-1-. /* PC-relative start address */
+ .long .LEND_sigreturn-.LSTART_sigreturn+1
+ .uleb128 0 /* Augmentation */
+ /* What follows are the instructions for the table generation.
+ We record the locations of each register saved. This is
+ complicated by the fact that the "CFA" is always assumed to
+ be the value of the stack pointer in the caller. This means
+ that we must define the CFA of this body of code to be the
+ saved value of the stack pointer in the sigcontext. Which
+ also means that there is no fixed relation to the other
+ saved registers, which means that we must use DW_CFA_expression
+ to compute their addresses. It also means that when we
+ adjust the stack with the popl, we have to do it all over again. */
+
+#define do_cfa_expr(offset) \
+ .byte 0x0f; /* DW_CFA_def_cfa_expression */ \
+ .uleb128 1f-0f; /* length */ \
+0: .byte 0x74; /* DW_OP_breg4 */ \
+ .sleb128 offset; /* offset */ \
+ .byte 0x06; /* DW_OP_deref */ \
+1:
+
+#define do_expr(regno, offset) \
+ .byte 0x10; /* DW_CFA_expression */ \
+ .uleb128 regno; /* regno */ \
+ .uleb128 1f-0f; /* length */ \
+0: .byte 0x74; /* DW_OP_breg4 */ \
+ .sleb128 offset; /* offset */ \
+1:
+
+ do_cfa_expr(IA32_SIGCONTEXT_sp+4)
+ do_expr(0, IA32_SIGCONTEXT_ax+4)
+ do_expr(1, IA32_SIGCONTEXT_cx+4)
+ do_expr(2, IA32_SIGCONTEXT_dx+4)
+ do_expr(3, IA32_SIGCONTEXT_bx+4)
+ do_expr(5, IA32_SIGCONTEXT_bp+4)
+ do_expr(6, IA32_SIGCONTEXT_si+4)
+ do_expr(7, IA32_SIGCONTEXT_di+4)
+ do_expr(8, IA32_SIGCONTEXT_ip+4)
+
+ .byte 0x42 /* DW_CFA_advance_loc 2 -- nop; popl eax. */
+
+ do_cfa_expr(IA32_SIGCONTEXT_sp)
+ do_expr(0, IA32_SIGCONTEXT_ax)
+ do_expr(1, IA32_SIGCONTEXT_cx)
+ do_expr(2, IA32_SIGCONTEXT_dx)
+ do_expr(3, IA32_SIGCONTEXT_bx)
+ do_expr(5, IA32_SIGCONTEXT_bp)
+ do_expr(6, IA32_SIGCONTEXT_si)
+ do_expr(7, IA32_SIGCONTEXT_di)
+ do_expr(8, IA32_SIGCONTEXT_ip)
+
+ .align 4
+.LENDFDEDLSI1:
+
+ .long .LENDFDEDLSI2-.LSTARTFDEDLSI2 /* Length FDE */
+.LSTARTFDEDLSI2:
+ .long .LSTARTFDEDLSI2-.LSTARTFRAMEDLSI1 /* CIE pointer */
+ /* HACK: See above wrt unwind library assumptions. */
+ .long .LSTART_rt_sigreturn-1-. /* PC-relative start address */
+ .long .LEND_rt_sigreturn-.LSTART_rt_sigreturn+1
+ .uleb128 0 /* Augmentation */
+ /* What follows are the instructions for the table generation.
+ We record the locations of each register saved. This is
+ slightly less complicated than the above, since we don't
+ modify the stack pointer in the process. */
+
+ do_cfa_expr(IA32_RT_SIGFRAME_sigcontext-4 + IA32_SIGCONTEXT_sp)
+ do_expr(0, IA32_RT_SIGFRAME_sigcontext-4 + IA32_SIGCONTEXT_ax)
+ do_expr(1, IA32_RT_SIGFRAME_sigcontext-4 + IA32_SIGCONTEXT_cx)
+ do_expr(2, IA32_RT_SIGFRAME_sigcontext-4 + IA32_SIGCONTEXT_dx)
+ do_expr(3, IA32_RT_SIGFRAME_sigcontext-4 + IA32_SIGCONTEXT_bx)
+ do_expr(5, IA32_RT_SIGFRAME_sigcontext-4 + IA32_SIGCONTEXT_bp)
+ do_expr(6, IA32_RT_SIGFRAME_sigcontext-4 + IA32_SIGCONTEXT_si)
+ do_expr(7, IA32_RT_SIGFRAME_sigcontext-4 + IA32_SIGCONTEXT_di)
+ do_expr(8, IA32_RT_SIGFRAME_sigcontext-4 + IA32_SIGCONTEXT_ip)
+
+ .align 4
+.LENDFDEDLSI2:
+ .previous
diff --git a/arch/x86/entry/vdso/syscall.S b/arch/x86/entry/vdso/syscall.S
new file mode 100644
index 0000000..50490c8
--- /dev/null
+++ b/arch/x86/entry/vdso/syscall.S
@@ -0,0 +1,62 @@
+/*
+ * Code for the vDSO. This version uses the syscall instruction.
+ */
+#include <linux/linkage.h>
+#include <asm/segment.h>
+
+ .text
+ .globl __kernel_vsyscall_syscall
+ .type __kernel_vsyscall_syscall,@function
+ ALIGN
+__kernel_vsyscall_syscall:
+.LSTART_vsyscall:
+ push %ebp
+.Lpush_ebp:
+ movl %ecx, %ebp
+ syscall
+ movl %ebp, %ecx
+ popl %ebp
+.Lpop_ebp:
+ ret
+.LEND_vsyscall:
+ .size __kernel_vsyscall_syscall,.-.LSTART_vsyscall
+
+ .section .eh_frame,"a",@progbits
+.LSTARTFRAME:
+ .long .LENDCIE-.LSTARTCIE
+.LSTARTCIE:
+ .long 0 /* CIE ID */
+ .byte 1 /* Version number */
+ .string "zR" /* NUL-terminated augmentation string */
+ .uleb128 1 /* Code alignment factor */
+ .sleb128 -4 /* Data alignment factor */
+ .byte 8 /* Return address register column */
+ .uleb128 1 /* Augmentation value length */
+ .byte 0x1b /* DW_EH_PE_pcrel|DW_EH_PE_sdata4. */
+ .byte 0x0c /* DW_CFA_def_cfa */
+ .uleb128 4
+ .uleb128 4
+ .byte 0x88 /* DW_CFA_offset, column 0x8 */
+ .uleb128 1
+ .align 4
+.LENDCIE:
+
+ .long .LENDFDE1-.LSTARTFDE1 /* Length FDE */
+.LSTARTFDE1:
+ .long .LSTARTFDE1-.LSTARTFRAME /* CIE pointer */
+ .long .LSTART_vsyscall-. /* PC-relative start address */
+ .long .LEND_vsyscall-.LSTART_vsyscall
+ .uleb128 0 /* Augmentation length */
+ /* What follows are the instructions for the table generation.
+ We have to record all changes of the stack pointer. */
+ .byte 0x40 + .Lpush_ebp-.LSTART_vsyscall /* DW_CFA_advance_loc */
+ .byte 0x0e /* DW_CFA_def_cfa_offset */
+ .uleb128 8
+ .byte 0x85, 0x02 /* DW_CFA_offset %ebp -8 */
+ .byte 0x40 + .Lpop_ebp-.Lpush_ebp /* DW_CFA_advance_loc */
+ .byte 0xc5 /* DW_CFA_restore %ebp */
+ .byte 0x0e /* DW_CFA_def_cfa_offset */
+ .uleb128 4
+ .align 4
+.LENDFDE1:
+ .previous
diff --git a/arch/x86/entry/vdso/sysenter.S b/arch/x86/entry/vdso/sysenter.S
new file mode 100644
index 0000000..458954a
--- /dev/null
+++ b/arch/x86/entry/vdso/sysenter.S
@@ -0,0 +1,107 @@
+/*
+ * Code for the vDSO. This version uses the sysenter instruction.
+ */
+#include <linux/linkage.h>
+
+/*
+ * The caller puts arg2 in %ecx, which gets pushed. The kernel will use
+ * %ecx itself for arg2. The pushing is because the sysexit instruction
+ * (found in entry.S) requires that we clobber %ecx with the desired %esp.
+ * User code might expect that %ecx is unclobbered though, as it would be
+ * for returning via the iret instruction, so we must push and pop.
+ *
+ * The caller puts arg3 in %edx, which the sysexit instruction requires
+ * for %eip. Thus, exactly as for arg2, we must push and pop.
+ *
+ * Arg6 is different. The caller puts arg6 in %ebp. Since the sysenter
+ * instruction clobbers %esp, the user's %esp won't even survive entry
+ * into the kernel. We store %esp in %ebp. Code in entry.S must fetch
+ * arg6 from the stack.
+ *
+ * You can not use this vsyscall for the clone() syscall because the
+ * three words on the parent stack do not get copied to the child.
+ */
+ .text
+ .globl __kernel_vsyscall_sysenter
+ .type __kernel_vsyscall_sysenter,@function
+ ALIGN
+__kernel_vsyscall_sysenter:
+.LSTART_vsyscall:
+ push %ecx
+.Lpush_ecx:
+ push %edx
+.Lpush_edx:
+ push %ebp
+.Lenter_kernel:
+ movl %esp,%ebp
+ sysenter
+
+ /* 7: align return point with nop's to make disassembly easier */
+ .space 7,0x90
+
+ /* 14: System call restart point is here! (SYSENTER_RETURN-2) */
+ int $0x80
+ /* 16: System call normal return point is here! */
+VDSO32_SYSENTER_RETURN: /* Symbol used by sysenter.c via vdso32-syms.h */
+ pop %ebp
+.Lpop_ebp:
+ pop %edx
+.Lpop_edx:
+ pop %ecx
+.Lpop_ecx:
+ ret
+.LEND_vsyscall:
+ .size __kernel_vsyscall_sysenter,.-.LSTART_vsyscall
+ .previous
+
+ .section .eh_frame,"a",@progbits
+.LSTARTFRAMEDLSI:
+ .long .LENDCIEDLSI-.LSTARTCIEDLSI
+.LSTARTCIEDLSI:
+ .long 0 /* CIE ID */
+ .byte 1 /* Version number */
+ .string "zR" /* NUL-terminated augmentation string */
+ .uleb128 1 /* Code alignment factor */
+ .sleb128 -4 /* Data alignment factor */
+ .byte 8 /* Return address register column */
+ .uleb128 1 /* Augmentation value length */
+ .byte 0x1b /* DW_EH_PE_pcrel|DW_EH_PE_sdata4. */
+ .byte 0x0c /* DW_CFA_def_cfa */
+ .uleb128 4
+ .uleb128 4
+ .byte 0x88 /* DW_CFA_offset, column 0x8 */
+ .uleb128 1
+ .align 4
+.LENDCIEDLSI:
+ .long .LENDFDEDLSI-.LSTARTFDEDLSI /* Length FDE */
+.LSTARTFDEDLSI:
+ .long .LSTARTFDEDLSI-.LSTARTFRAMEDLSI /* CIE pointer */
+ .long .LSTART_vsyscall-. /* PC-relative start address */
+ .long .LEND_vsyscall-.LSTART_vsyscall
+ .uleb128 0
+ /* What follows are the instructions for the table generation.
+ We have to record all changes of the stack pointer. */
+ .byte 0x40 + (.Lpush_ecx-.LSTART_vsyscall) /* DW_CFA_advance_loc */
+ .byte 0x0e /* DW_CFA_def_cfa_offset */
+ .byte 0x08 /* RA at offset 8 now */
+ .byte 0x40 + (.Lpush_edx-.Lpush_ecx) /* DW_CFA_advance_loc */
+ .byte 0x0e /* DW_CFA_def_cfa_offset */
+ .byte 0x0c /* RA at offset 12 now */
+ .byte 0x40 + (.Lenter_kernel-.Lpush_edx) /* DW_CFA_advance_loc */
+ .byte 0x0e /* DW_CFA_def_cfa_offset */
+ .byte 0x10 /* RA at offset 16 now */
+ .byte 0x85, 0x04 /* DW_CFA_offset %ebp -16 */
+ /* Finally the epilogue. */
+ .byte 0x40 + (.Lpop_ebp-.Lenter_kernel) /* DW_CFA_advance_loc */
+ .byte 0x0e /* DW_CFA_def_cfa_offset */
+ .byte 0x0c /* RA at offset 12 now */
+ .byte 0xc5 /* DW_CFA_restore %ebp */
+ .byte 0x40 + (.Lpop_edx-.Lpop_ebp) /* DW_CFA_advance_loc */
+ .byte 0x0e /* DW_CFA_def_cfa_offset */
+ .byte 0x08 /* RA at offset 8 now */
+ .byte 0x40 + (.Lpop_ecx-.Lpop_edx) /* DW_CFA_advance_loc */
+ .byte 0x0e /* DW_CFA_def_cfa_offset */
+ .byte 0x04 /* RA at offset 4 now */
+ .align 4
+.LENDFDEDLSI:
+ .previous
diff --git a/arch/x86/entry/vdso/vdso32.lds.S b/arch/x86/entry/vdso/vdso32.lds.S
new file mode 100644
index 0000000..d90e607
--- /dev/null
+++ b/arch/x86/entry/vdso/vdso32.lds.S
@@ -0,0 +1,37 @@
+/*
+ * Linker script for 32-bit vDSO.
+ * We #include the file to define the layout details.
+ *
+ * This file defines the version script giving the user-exported symbols in
+ * the DSO.
+ */
+
+#include <asm/page.h>
+
+#define BUILD_VDSO32
+
+#include "vdso-layout.lds.S"
+
+/* The ELF entry point can be used to set the AT_SYSINFO value. */
+ENTRY(__kernel_vsyscall);
+
+/*
+ * This controls what userland symbols we export from the vDSO.
+ */
+VERSION
+{
+ LINUX_2.6 {
+ global:
+ __vdso_clock_gettime;
+ __vdso_gettimeofday;
+ __vdso_time;
+ };
+
+ LINUX_2.5 {
+ global:
+ __kernel_vsyscall;
+ __kernel_sigreturn;
+ __kernel_rt_sigreturn;
+ local: *;
+ };
+}
diff --git a/arch/x86/entry/vdso/vdso32/.gitignore b/arch/x86/entry/vdso/vdso32/.gitignore
deleted file mode 100644
index e45fba9..0000000
--- a/arch/x86/entry/vdso/vdso32/.gitignore
+++ /dev/null
@@ -1 +0,0 @@
-vdso32.lds
diff --git a/arch/x86/entry/vdso/vdso32/int80.S b/arch/x86/entry/vdso/vdso32/int80.S
deleted file mode 100644
index 667b25e..0000000
--- a/arch/x86/entry/vdso/vdso32/int80.S
+++ /dev/null
@@ -1,45 +0,0 @@
-/*
- * Code for the vDSO. This version uses the old int $0x80 method.
- */
-#include <linux/linkage.h>
-
- .text
- .globl __kernel_vsyscall
- .type __kernel_vsyscall,@function
- ALIGN
-__kernel_vsyscall:
-.LSTART_vsyscall:
- int $0x80
- ret
-.LEND_vsyscall:
- .size __kernel_vsyscall,.-.LSTART_vsyscall
- .previous
-
- .section .eh_frame,"a",@progbits
-.LSTARTFRAMEDLSI:
- .long .LENDCIEDLSI-.LSTARTCIEDLSI
-.LSTARTCIEDLSI:
- .long 0 /* CIE ID */
- .byte 1 /* Version number */
- .string "zR" /* NUL-terminated augmentation string */
- .uleb128 1 /* Code alignment factor */
- .sleb128 -4 /* Data alignment factor */
- .byte 8 /* Return address register column */
- .uleb128 1 /* Augmentation value length */
- .byte 0x1b /* DW_EH_PE_pcrel|DW_EH_PE_sdata4. */
- .byte 0x0c /* DW_CFA_def_cfa */
- .uleb128 4
- .uleb128 4
- .byte 0x88 /* DW_CFA_offset, column 0x8 */
- .uleb128 1
- .align 4
-.LENDCIEDLSI:
- .long .LENDFDEDLSI-.LSTARTFDEDLSI /* Length FDE */
-.LSTARTFDEDLSI:
- .long .LSTARTFDEDLSI-.LSTARTFRAMEDLSI /* CIE pointer */
- .long .LSTART_vsyscall-. /* PC-relative start address */
- .long .LEND_vsyscall-.LSTART_vsyscall
- .uleb128 0
- .align 4
-.LENDFDEDLSI:
- .previous
diff --git a/arch/x86/entry/vdso/vdso32/sigreturn.S b/arch/x86/entry/vdso/vdso32/sigreturn.S
deleted file mode 100644
index ca0e6ca..0000000
--- a/arch/x86/entry/vdso/vdso32/sigreturn.S
+++ /dev/null
@@ -1,150 +0,0 @@
-/*
- * Common code for the sigreturn entry points in vDSO images.
- * So far this code is the same for both int80 and sysenter versions.
- * This file is #include'd by int80.S et al to define them first thing.
- * The kernel assumes that the addresses of these routines are constant
- * for all vDSO implementations.
- */
-
-#include <linux/linkage.h>
-#include <asm/unistd_32.h>
-#include <asm/asm-offsets.h>
-#include <asm/alternative-asm.h>
-#include <asm/cpufeature.h>
-
-#ifdef CONFIG_COMPAT
-#define SYSCALL_ENTER_KERNEL \
- ALTERNATIVE "int $0x80", "syscall", X86_FEATURE_SYSCALL32
-#else
-#define SYSCALL_ENTER_KERNEL int $0x80
-#endif
-
- .text
- .globl __kernel_sigreturn
- .type __kernel_sigreturn,@function
- nop /* this guy is needed for .LSTARTFDEDLSI1 below (watch for HACK) */
- ALIGN
-__kernel_sigreturn:
-.LSTART_sigreturn:
- popl %eax /* XXX does this mean it needs unwind info? */
- movl $__NR_sigreturn, %eax
- SYSCALL_ENTER_KERNEL
-.LEND_sigreturn:
- nop
- .size __kernel_sigreturn,.-.LSTART_sigreturn
-
- .globl __kernel_rt_sigreturn
- .type __kernel_rt_sigreturn,@function
- ALIGN
-__kernel_rt_sigreturn:
-.LSTART_rt_sigreturn:
- movl $__NR_rt_sigreturn, %eax
- SYSCALL_ENTER_KERNEL
-.LEND_rt_sigreturn:
- nop
- .size __kernel_rt_sigreturn,.-.LSTART_rt_sigreturn
- .previous
-
- .section .eh_frame,"a",@progbits
-.LSTARTFRAMEDLSI1:
- .long .LENDCIEDLSI1-.LSTARTCIEDLSI1
-.LSTARTCIEDLSI1:
- .long 0 /* CIE ID */
- .byte 1 /* Version number */
- .string "zRS" /* NUL-terminated augmentation string */
- .uleb128 1 /* Code alignment factor */
- .sleb128 -4 /* Data alignment factor */
- .byte 8 /* Return address register column */
- .uleb128 1 /* Augmentation value length */
- .byte 0x1b /* DW_EH_PE_pcrel|DW_EH_PE_sdata4. */
- .byte 0 /* DW_CFA_nop */
- .align 4
-.LENDCIEDLSI1:
- .long .LENDFDEDLSI1-.LSTARTFDEDLSI1 /* Length FDE */
-.LSTARTFDEDLSI1:
- .long .LSTARTFDEDLSI1-.LSTARTFRAMEDLSI1 /* CIE pointer */
- /* HACK: The dwarf2 unwind routines will subtract 1 from the
- return address to get an address in the middle of the
- presumed call instruction. Since we didn't get here via
- a call, we need to include the nop before the real start
- to make up for it. */
- .long .LSTART_sigreturn-1-. /* PC-relative start address */
- .long .LEND_sigreturn-.LSTART_sigreturn+1
- .uleb128 0 /* Augmentation */
- /* What follows are the instructions for the table generation.
- We record the locations of each register saved. This is
- complicated by the fact that the "CFA" is always assumed to
- be the value of the stack pointer in the caller. This means
- that we must define the CFA of this body of code to be the
- saved value of the stack pointer in the sigcontext. Which
- also means that there is no fixed relation to the other
- saved registers, which means that we must use DW_CFA_expression
- to compute their addresses. It also means that when we
- adjust the stack with the popl, we have to do it all over again. */
-
-#define do_cfa_expr(offset) \
- .byte 0x0f; /* DW_CFA_def_cfa_expression */ \
- .uleb128 1f-0f; /* length */ \
-0: .byte 0x74; /* DW_OP_breg4 */ \
- .sleb128 offset; /* offset */ \
- .byte 0x06; /* DW_OP_deref */ \
-1:
-
-#define do_expr(regno, offset) \
- .byte 0x10; /* DW_CFA_expression */ \
- .uleb128 regno; /* regno */ \
- .uleb128 1f-0f; /* length */ \
-0: .byte 0x74; /* DW_OP_breg4 */ \
- .sleb128 offset; /* offset */ \
-1:
-
- do_cfa_expr(IA32_SIGCONTEXT_sp+4)
- do_expr(0, IA32_SIGCONTEXT_ax+4)
- do_expr(1, IA32_SIGCONTEXT_cx+4)
- do_expr(2, IA32_SIGCONTEXT_dx+4)
- do_expr(3, IA32_SIGCONTEXT_bx+4)
- do_expr(5, IA32_SIGCONTEXT_bp+4)
- do_expr(6, IA32_SIGCONTEXT_si+4)
- do_expr(7, IA32_SIGCONTEXT_di+4)
- do_expr(8, IA32_SIGCONTEXT_ip+4)
-
- .byte 0x42 /* DW_CFA_advance_loc 2 -- nop; popl eax. */
-
- do_cfa_expr(IA32_SIGCONTEXT_sp)
- do_expr(0, IA32_SIGCONTEXT_ax)
- do_expr(1, IA32_SIGCONTEXT_cx)
- do_expr(2, IA32_SIGCONTEXT_dx)
- do_expr(3, IA32_SIGCONTEXT_bx)
- do_expr(5, IA32_SIGCONTEXT_bp)
- do_expr(6, IA32_SIGCONTEXT_si)
- do_expr(7, IA32_SIGCONTEXT_di)
- do_expr(8, IA32_SIGCONTEXT_ip)
-
- .align 4
-.LENDFDEDLSI1:
-
- .long .LENDFDEDLSI2-.LSTARTFDEDLSI2 /* Length FDE */
-.LSTARTFDEDLSI2:
- .long .LSTARTFDEDLSI2-.LSTARTFRAMEDLSI1 /* CIE pointer */
- /* HACK: See above wrt unwind library assumptions. */
- .long .LSTART_rt_sigreturn-1-. /* PC-relative start address */
- .long .LEND_rt_sigreturn-.LSTART_rt_sigreturn+1
- .uleb128 0 /* Augmentation */
- /* What follows are the instructions for the table generation.
- We record the locations of each register saved. This is
- slightly less complicated than the above, since we don't
- modify the stack pointer in the process. */
-
- do_cfa_expr(IA32_RT_SIGFRAME_sigcontext-4 + IA32_SIGCONTEXT_sp)
- do_expr(0, IA32_RT_SIGFRAME_sigcontext-4 + IA32_SIGCONTEXT_ax)
- do_expr(1, IA32_RT_SIGFRAME_sigcontext-4 + IA32_SIGCONTEXT_cx)
- do_expr(2, IA32_RT_SIGFRAME_sigcontext-4 + IA32_SIGCONTEXT_dx)
- do_expr(3, IA32_RT_SIGFRAME_sigcontext-4 + IA32_SIGCONTEXT_bx)
- do_expr(5, IA32_RT_SIGFRAME_sigcontext-4 + IA32_SIGCONTEXT_bp)
- do_expr(6, IA32_RT_SIGFRAME_sigcontext-4 + IA32_SIGCONTEXT_si)
- do_expr(7, IA32_RT_SIGFRAME_sigcontext-4 + IA32_SIGCONTEXT_di)
- do_expr(8, IA32_RT_SIGFRAME_sigcontext-4 + IA32_SIGCONTEXT_ip)
-
- .align 4
-.LENDFDEDLSI2:
- .previous
diff --git a/arch/x86/entry/vdso/vdso32/syscall.S b/arch/x86/entry/vdso/vdso32/syscall.S
deleted file mode 100644
index 50490c8..0000000
--- a/arch/x86/entry/vdso/vdso32/syscall.S
+++ /dev/null
@@ -1,62 +0,0 @@
-/*
- * Code for the vDSO. This version uses the syscall instruction.
- */
-#include <linux/linkage.h>
-#include <asm/segment.h>
-
- .text
- .globl __kernel_vsyscall_syscall
- .type __kernel_vsyscall_syscall,@function
- ALIGN
-__kernel_vsyscall_syscall:
-.LSTART_vsyscall:
- push %ebp
-.Lpush_ebp:
- movl %ecx, %ebp
- syscall
- movl %ebp, %ecx
- popl %ebp
-.Lpop_ebp:
- ret
-.LEND_vsyscall:
- .size __kernel_vsyscall_syscall,.-.LSTART_vsyscall
-
- .section .eh_frame,"a",@progbits
-.LSTARTFRAME:
- .long .LENDCIE-.LSTARTCIE
-.LSTARTCIE:
- .long 0 /* CIE ID */
- .byte 1 /* Version number */
- .string "zR" /* NUL-terminated augmentation string */
- .uleb128 1 /* Code alignment factor */
- .sleb128 -4 /* Data alignment factor */
- .byte 8 /* Return address register column */
- .uleb128 1 /* Augmentation value length */
- .byte 0x1b /* DW_EH_PE_pcrel|DW_EH_PE_sdata4. */
- .byte 0x0c /* DW_CFA_def_cfa */
- .uleb128 4
- .uleb128 4
- .byte 0x88 /* DW_CFA_offset, column 0x8 */
- .uleb128 1
- .align 4
-.LENDCIE:
-
- .long .LENDFDE1-.LSTARTFDE1 /* Length FDE */
-.LSTARTFDE1:
- .long .LSTARTFDE1-.LSTARTFRAME /* CIE pointer */
- .long .LSTART_vsyscall-. /* PC-relative start address */
- .long .LEND_vsyscall-.LSTART_vsyscall
- .uleb128 0 /* Augmentation length */
- /* What follows are the instructions for the table generation.
- We have to record all changes of the stack pointer. */
- .byte 0x40 + .Lpush_ebp-.LSTART_vsyscall /* DW_CFA_advance_loc */
- .byte 0x0e /* DW_CFA_def_cfa_offset */
- .uleb128 8
- .byte 0x85, 0x02 /* DW_CFA_offset %ebp -8 */
- .byte 0x40 + .Lpop_ebp-.Lpush_ebp /* DW_CFA_advance_loc */
- .byte 0xc5 /* DW_CFA_restore %ebp */
- .byte 0x0e /* DW_CFA_def_cfa_offset */
- .uleb128 4
- .align 4
-.LENDFDE1:
- .previous
diff --git a/arch/x86/entry/vdso/vdso32/sysenter.S b/arch/x86/entry/vdso/vdso32/sysenter.S
deleted file mode 100644
index 458954a..0000000
--- a/arch/x86/entry/vdso/vdso32/sysenter.S
+++ /dev/null
@@ -1,107 +0,0 @@
-/*
- * Code for the vDSO. This version uses the sysenter instruction.
- */
-#include <linux/linkage.h>
-
-/*
- * The caller puts arg2 in %ecx, which gets pushed. The kernel will use
- * %ecx itself for arg2. The pushing is because the sysexit instruction
- * (found in entry.S) requires that we clobber %ecx with the desired %esp.
- * User code might expect that %ecx is unclobbered though, as it would be
- * for returning via the iret instruction, so we must push and pop.
- *
- * The caller puts arg3 in %edx, which the sysexit instruction requires
- * for %eip. Thus, exactly as for arg2, we must push and pop.
- *
- * Arg6 is different. The caller puts arg6 in %ebp. Since the sysenter
- * instruction clobbers %esp, the user's %esp won't even survive entry
- * into the kernel. We store %esp in %ebp. Code in entry.S must fetch
- * arg6 from the stack.
- *
- * You can not use this vsyscall for the clone() syscall because the
- * three words on the parent stack do not get copied to the child.
- */
- .text
- .globl __kernel_vsyscall_sysenter
- .type __kernel_vsyscall_sysenter,@function
- ALIGN
-__kernel_vsyscall_sysenter:
-.LSTART_vsyscall:
- push %ecx
-.Lpush_ecx:
- push %edx
-.Lpush_edx:
- push %ebp
-.Lenter_kernel:
- movl %esp,%ebp
- sysenter
-
- /* 7: align return point with nop's to make disassembly easier */
- .space 7,0x90
-
- /* 14: System call restart point is here! (SYSENTER_RETURN-2) */
- int $0x80
- /* 16: System call normal return point is here! */
-VDSO32_SYSENTER_RETURN: /* Symbol used by sysenter.c via vdso32-syms.h */
- pop %ebp
-.Lpop_ebp:
- pop %edx
-.Lpop_edx:
- pop %ecx
-.Lpop_ecx:
- ret
-.LEND_vsyscall:
- .size __kernel_vsyscall_sysenter,.-.LSTART_vsyscall
- .previous
-
- .section .eh_frame,"a",@progbits
-.LSTARTFRAMEDLSI:
- .long .LENDCIEDLSI-.LSTARTCIEDLSI
-.LSTARTCIEDLSI:
- .long 0 /* CIE ID */
- .byte 1 /* Version number */
- .string "zR" /* NUL-terminated augmentation string */
- .uleb128 1 /* Code alignment factor */
- .sleb128 -4 /* Data alignment factor */
- .byte 8 /* Return address register column */
- .uleb128 1 /* Augmentation value length */
- .byte 0x1b /* DW_EH_PE_pcrel|DW_EH_PE_sdata4. */
- .byte 0x0c /* DW_CFA_def_cfa */
- .uleb128 4
- .uleb128 4
- .byte 0x88 /* DW_CFA_offset, column 0x8 */
- .uleb128 1
- .align 4
-.LENDCIEDLSI:
- .long .LENDFDEDLSI-.LSTARTFDEDLSI /* Length FDE */
-.LSTARTFDEDLSI:
- .long .LSTARTFDEDLSI-.LSTARTFRAMEDLSI /* CIE pointer */
- .long .LSTART_vsyscall-. /* PC-relative start address */
- .long .LEND_vsyscall-.LSTART_vsyscall
- .uleb128 0
- /* What follows are the instructions for the table generation.
- We have to record all changes of the stack pointer. */
- .byte 0x40 + (.Lpush_ecx-.LSTART_vsyscall) /* DW_CFA_advance_loc */
- .byte 0x0e /* DW_CFA_def_cfa_offset */
- .byte 0x08 /* RA at offset 8 now */
- .byte 0x40 + (.Lpush_edx-.Lpush_ecx) /* DW_CFA_advance_loc */
- .byte 0x0e /* DW_CFA_def_cfa_offset */
- .byte 0x0c /* RA at offset 12 now */
- .byte 0x40 + (.Lenter_kernel-.Lpush_edx) /* DW_CFA_advance_loc */
- .byte 0x0e /* DW_CFA_def_cfa_offset */
- .byte 0x10 /* RA at offset 16 now */
- .byte 0x85, 0x04 /* DW_CFA_offset %ebp -16 */
- /* Finally the epilogue. */
- .byte 0x40 + (.Lpop_ebp-.Lenter_kernel) /* DW_CFA_advance_loc */
- .byte 0x0e /* DW_CFA_def_cfa_offset */
- .byte 0x0c /* RA at offset 12 now */
- .byte 0xc5 /* DW_CFA_restore %ebp */
- .byte 0x40 + (.Lpop_edx-.Lpop_ebp) /* DW_CFA_advance_loc */
- .byte 0x0e /* DW_CFA_def_cfa_offset */
- .byte 0x08 /* RA at offset 8 now */
- .byte 0x40 + (.Lpop_ecx-.Lpop_edx) /* DW_CFA_advance_loc */
- .byte 0x0e /* DW_CFA_def_cfa_offset */
- .byte 0x04 /* RA at offset 4 now */
- .align 4
-.LENDFDEDLSI:
- .previous
diff --git a/arch/x86/entry/vdso/vdso32/vdso32.lds.S b/arch/x86/entry/vdso/vdso32/vdso32.lds.S
deleted file mode 100644
index 31056cf..0000000
--- a/arch/x86/entry/vdso/vdso32/vdso32.lds.S
+++ /dev/null
@@ -1,37 +0,0 @@
-/*
- * Linker script for 32-bit vDSO.
- * We #include the file to define the layout details.
- *
- * This file defines the version script giving the user-exported symbols in
- * the DSO.
- */
-
-#include <asm/page.h>
-
-#define BUILD_VDSO32
-
-#include "../vdso-layout.lds.S"
-
-/* The ELF entry point can be used to set the AT_SYSINFO value. */
-ENTRY(__kernel_vsyscall);
-
-/*
- * This controls what userland symbols we export from the vDSO.
- */
-VERSION
-{
- LINUX_2.6 {
- global:
- __vdso_clock_gettime;
- __vdso_gettimeofday;
- __vdso_time;
- };
-
- LINUX_2.5 {
- global:
- __kernel_vsyscall;
- __kernel_sigreturn;
- __kernel_rt_sigreturn;
- local: *;
- };
-}
--
2.4.3

2015-08-29 16:11:08

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 0/7] x86 vdso32 cleanups

On Sat, Aug 29, 2015 at 8:20 AM, Brian Gerst <[email protected]> wrote:
> This patch set contains several cleanups to the 32-bit VDSO. The
> main change is to only build one VDSO image, and select the syscall
> entry point at runtime.

Oh no, we have dueling patches!

I have a 2/3 finished series that cleans up the AT_SYSINFO mess
differently, as I outlined earlier. I've only done the compat and
common bits (no 32-bit native support quite yet), and it enters
successfully on Intel using SYSENTER and on (fake) AMD using SYSCALL.
The SYSRET bit isn't there yet.

Other than some ifdeffery, the final system_call.S looks like this:

https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/tree/arch/x86/entry/vdso/vdso32/system_call.S?h=x86/entry_compat

The meat is (sorry for whitespace damage):

.text
.globl __kernel_vsyscall
.type __kernel_vsyscall,@function
ALIGN
__kernel_vsyscall:
CFI_STARTPROC
/*
* Reshuffle regs so that all of any of the entry instructions
* will preserve enough state.
*/
pushl %edx
CFI_ADJUST_CFA_OFFSET 4
CFI_REL_OFFSET edx, 0
pushl %ecx
CFI_ADJUST_CFA_OFFSET 4
CFI_REL_OFFSET ecx, 0
movl %esp, %ecx

#ifdef CONFIG_X86_64
/* If SYSENTER is available, use it. */
ALTERNATIVE_2 "", "sysenter", X86_FEATURE_SYSENTER32, \
"syscall", X86_FEATURE_SYSCALL32
#endif

/* Enter using int $0x80 */
movl (%esp), %ecx
int $0x80
GLOBAL(int80_landing_pad)

/* Restore ECX and EDX in case they were clobbered. */
popl %ecx
CFI_RESTORE ecx
CFI_ADJUST_CFA_OFFSET -4
popl %edx
CFI_RESTORE edx
CFI_ADJUST_CFA_OFFSET -4
ret
CFI_ENDPROC

.size __kernel_vsyscall,.-__kernel_vsyscall
.previous

And that's it.

What do you think? This comes with massively cleaned up kernel-side
asm as well as a test case that actually validates the CFI directives.

Certainly, a bunch of your patches make sense regardless, and I'll
review them and add them to my queue soon.

--Andy

2015-08-30 16:47:47

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 3/7] x86/vdso32: Remove unused vdso-fakesections.c

On Sat, Aug 29, 2015 at 8:20 AM, Brian Gerst <[email protected]> wrote:
> Signed-off-by: Brian Gerst <[email protected]>

Acked-by: Andy Lutomirski <[email protected]>

--Andy

2015-08-30 21:18:33

by Brian Gerst

[permalink] [raw]
Subject: Re: [PATCH 0/7] x86 vdso32 cleanups

On Sat, Aug 29, 2015 at 12:10 PM, Andy Lutomirski <[email protected]> wrote:
> On Sat, Aug 29, 2015 at 8:20 AM, Brian Gerst <[email protected]> wrote:
>> This patch set contains several cleanups to the 32-bit VDSO. The
>> main change is to only build one VDSO image, and select the syscall
>> entry point at runtime.
>
> Oh no, we have dueling patches!
>
> I have a 2/3 finished series that cleans up the AT_SYSINFO mess
> differently, as I outlined earlier. I've only done the compat and
> common bits (no 32-bit native support quite yet), and it enters
> successfully on Intel using SYSENTER and on (fake) AMD using SYSCALL.
> The SYSRET bit isn't there yet.
>
> Other than some ifdeffery, the final system_call.S looks like this:
>
> https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/tree/arch/x86/entry/vdso/vdso32/system_call.S?h=x86/entry_compat
>
> The meat is (sorry for whitespace damage):
>
> .text
> .globl __kernel_vsyscall
> .type __kernel_vsyscall,@function
> ALIGN
> __kernel_vsyscall:
> CFI_STARTPROC
> /*
> * Reshuffle regs so that all of any of the entry instructions
> * will preserve enough state.
> */
> pushl %edx
> CFI_ADJUST_CFA_OFFSET 4
> CFI_REL_OFFSET edx, 0
> pushl %ecx
> CFI_ADJUST_CFA_OFFSET 4
> CFI_REL_OFFSET ecx, 0
> movl %esp, %ecx
>
> #ifdef CONFIG_X86_64
> /* If SYSENTER is available, use it. */
> ALTERNATIVE_2 "", "sysenter", X86_FEATURE_SYSENTER32, \
> "syscall", X86_FEATURE_SYSCALL32
> #endif
>
> /* Enter using int $0x80 */
> movl (%esp), %ecx
> int $0x80
> GLOBAL(int80_landing_pad)
>
> /* Restore ECX and EDX in case they were clobbered. */
> popl %ecx
> CFI_RESTORE ecx
> CFI_ADJUST_CFA_OFFSET -4
> popl %edx
> CFI_RESTORE edx
> CFI_ADJUST_CFA_OFFSET -4
> ret
> CFI_ENDPROC
>
> .size __kernel_vsyscall,.-__kernel_vsyscall
> .previous
>
> And that's it.
>
> What do you think? This comes with massively cleaned up kernel-side
> asm as well as a test case that actually validates the CFI directives.
>
> Certainly, a bunch of your patches make sense regardless, and I'll
> review them and add them to my queue soon.
>
> --Andy

How does the performance compare to the original? Looking at the
disassembly, there are two added function calls, and it reloads the
args from the stack instead of just shuffling registers.

--
Brian Gerst

2015-08-31 02:52:28

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 0/7] x86 vdso32 cleanups

On Sun, Aug 30, 2015 at 2:18 PM, Brian Gerst <[email protected]> wrote:
> On Sat, Aug 29, 2015 at 12:10 PM, Andy Lutomirski <[email protected]> wrote:
>> On Sat, Aug 29, 2015 at 8:20 AM, Brian Gerst <[email protected]> wrote:
>>> This patch set contains several cleanups to the 32-bit VDSO. The
>>> main change is to only build one VDSO image, and select the syscall
>>> entry point at runtime.
>>
>> Oh no, we have dueling patches!
>>
>> I have a 2/3 finished series that cleans up the AT_SYSINFO mess
>> differently, as I outlined earlier. I've only done the compat and
>> common bits (no 32-bit native support quite yet), and it enters
>> successfully on Intel using SYSENTER and on (fake) AMD using SYSCALL.
>> The SYSRET bit isn't there yet.
>>
>> Other than some ifdeffery, the final system_call.S looks like this:
>>
>> https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/tree/arch/x86/entry/vdso/vdso32/system_call.S?h=x86/entry_compat
>>
>> The meat is (sorry for whitespace damage):
>>
>> .text
>> .globl __kernel_vsyscall
>> .type __kernel_vsyscall,@function
>> ALIGN
>> __kernel_vsyscall:
>> CFI_STARTPROC
>> /*
>> * Reshuffle regs so that all of any of the entry instructions
>> * will preserve enough state.
>> */
>> pushl %edx
>> CFI_ADJUST_CFA_OFFSET 4
>> CFI_REL_OFFSET edx, 0
>> pushl %ecx
>> CFI_ADJUST_CFA_OFFSET 4
>> CFI_REL_OFFSET ecx, 0
>> movl %esp, %ecx
>>
>> #ifdef CONFIG_X86_64
>> /* If SYSENTER is available, use it. */
>> ALTERNATIVE_2 "", "sysenter", X86_FEATURE_SYSENTER32, \
>> "syscall", X86_FEATURE_SYSCALL32
>> #endif
>>
>> /* Enter using int $0x80 */
>> movl (%esp), %ecx
>> int $0x80
>> GLOBAL(int80_landing_pad)
>>
>> /* Restore ECX and EDX in case they were clobbered. */
>> popl %ecx
>> CFI_RESTORE ecx
>> CFI_ADJUST_CFA_OFFSET -4
>> popl %edx
>> CFI_RESTORE edx
>> CFI_ADJUST_CFA_OFFSET -4
>> ret
>> CFI_ENDPROC
>>
>> .size __kernel_vsyscall,.-__kernel_vsyscall
>> .previous
>>
>> And that's it.
>>
>> What do you think? This comes with massively cleaned up kernel-side
>> asm as well as a test case that actually validates the CFI directives.
>>
>> Certainly, a bunch of your patches make sense regardless, and I'll
>> review them and add them to my queue soon.
>>
>> --Andy
>
> How does the performance compare to the original? Looking at the
> disassembly, there are two added function calls, and it reloads the
> args from the stack instead of just shuffling registers.

The replacement is dramatically faster, which means I probably
benchmarked it wrong. I'll try again in a day or two.

--Andy

2015-09-01 01:38:23

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 0/7] x86 vdso32 cleanups

On Mon, Aug 31, 2015 at 6:19 PM, Andy Lutomirski <[email protected]> wrote:
>
> On Sun, Aug 30, 2015 at 7:52 PM, Andy Lutomirski <[email protected]> wrote:
>>
>> On Sun, Aug 30, 2015 at 2:18 PM, Brian Gerst <[email protected]> wrote:
>> > On Sat, Aug 29, 2015 at 12:10 PM, Andy Lutomirski <[email protected]> wrote:
>> >> On Sat, Aug 29, 2015 at 8:20 AM, Brian Gerst <[email protected]> wrote:
>> >>> This patch set contains several cleanups to the 32-bit VDSO. The
>> >>> main change is to only build one VDSO image, and select the syscall
>> >>> entry point at runtime.
>> >>
>> >> Oh no, we have dueling patches!
>> >>
>> >> I have a 2/3 finished series that cleans up the AT_SYSINFO mess
>> >> differently, as I outlined earlier. I've only done the compat and
>> >> common bits (no 32-bit native support quite yet), and it enters
>> >> successfully on Intel using SYSENTER and on (fake) AMD using SYSCALL.
>> >> The SYSRET bit isn't there yet.
>> >>
>> >> Other than some ifdeffery, the final system_call.S looks like this:
>> >>
>> >> https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/tree/arch/x86/entry/vdso/vdso32/system_call.S?h=x86/entry_compat
>> >>
>> >> The meat is (sorry for whitespace damage):
>> >>
>> >> .text
>> >> .globl __kernel_vsyscall
>> >> .type __kernel_vsyscall,@function
>> >> ALIGN
>> >> __kernel_vsyscall:
>> >> CFI_STARTPROC
>> >> /*
>> >> * Reshuffle regs so that all of any of the entry instructions
>> >> * will preserve enough state.
>> >> */
>> >> pushl %edx
>> >> CFI_ADJUST_CFA_OFFSET 4
>> >> CFI_REL_OFFSET edx, 0
>> >> pushl %ecx
>> >> CFI_ADJUST_CFA_OFFSET 4
>> >> CFI_REL_OFFSET ecx, 0
>> >> movl %esp, %ecx
>> >>
>> >> #ifdef CONFIG_X86_64
>> >> /* If SYSENTER is available, use it. */
>> >> ALTERNATIVE_2 "", "sysenter", X86_FEATURE_SYSENTER32, \
>> >> "syscall", X86_FEATURE_SYSCALL32
>> >> #endif
>> >>
>> >> /* Enter using int $0x80 */
>> >> movl (%esp), %ecx
>> >> int $0x80
>> >> GLOBAL(int80_landing_pad)
>> >>
>> >> /* Restore ECX and EDX in case they were clobbered. */
>> >> popl %ecx
>> >> CFI_RESTORE ecx
>> >> CFI_ADJUST_CFA_OFFSET -4
>> >> popl %edx
>> >> CFI_RESTORE edx
>> >> CFI_ADJUST_CFA_OFFSET -4
>> >> ret
>> >> CFI_ENDPROC
>> >>
>> >> .size __kernel_vsyscall,.-__kernel_vsyscall
>> >> .previous
>> >>
>> >> And that's it.
>> >>
>> >> What do you think? This comes with massively cleaned up kernel-side
>> >> asm as well as a test case that actually validates the CFI directives.
>> >>
>> >> Certainly, a bunch of your patches make sense regardless, and I'll
>> >> review them and add them to my queue soon.
>> >>
>> >> --Andy
>> >
>> > How does the performance compare to the original? Looking at the
>> > disassembly, there are two added function calls, and it reloads the
>> > args from the stack instead of just shuffling registers.
>>
>> The replacement is dramatically faster, which means I probably
>> benchmarked it wrong. I'll try again in a day or two.
>
>
> It's enough slower to be problematic. I need to figure out how to trace it properly. (Hmm? Maybe it's time to learn how to get perf on the host to trace a KVM guest.)
>
> Everything is and was hilariously slow with context tracking on. That needs to get fixed, and hopefully once this entry stuff is done someone will do the other end of it.
>

I got random errors from perf kvm, but I think I found at least part
of the issue. The two irqs_disabled() calls in common.c are kind of
expensive. I should disable them on non-lockdep kernels.

The context tracking hooks are also too expensive, even when disabled.
I should do something to optimize those. Hello, static keys? This
doesn't affect syscalls, though.

With context tracking off and the irqs_disabled checks commented out,
we're probably doing well enough. We can always tweak the C code and
aggressively force inlining if we want a few cycles back.

--Andy