2014-02-16 21:53:46

by Stefani Seibold

[permalink] [raw]
Subject: [PATCH v18 0/10] Add 32 bit VDSO time function support

This patch add the functions vdso_gettimeofday(), vdso_clock_gettime()
and vdso_time() to the 32 bit VDSO.

The reason to do this was to get a fast reliable time stamp. Many developers
uses TSC to get a fast time stamp, without knowing the pitfalls. VDSO
time functions a fast and a reliable way, because the kernel knows the
best time source and the P- and C-state of the CPU.

The helper library to use the VDSO functions can be download at
http://http://seibold.net/vdso.c
The libary is very small, only 228 lines of code. Compile it with
gcc -Wall -O3 -fpic vdso.c -lrt -shared -o libvdso.so
and use it with LD_PRELOAD=<path>/libvdso.so

This kind of helper must be integrated into glibc, for x86 64 bit and
PowerPC it is already there.

Some linux 32 bit kernel benchmark results (all measurements are in nano
seconds):

Intel(R) Celeron(TM) CPU 400MHz

Average time kernel call:
gettimeofday(): 1039
clock_gettime(): 1578
time(): 526
Average time VDSO call:
gettimeofday(): 378
clock_gettime(): 303
time(): 60

Celeron(R) Dual-Core CPU T3100 1.90GHz

Average time kernel call:
gettimeofday(): 209
clock_gettime(): 406
time(): 135
Average time VDSO call:
gettimeofday(): 51
clock_gettime(): 43
time(): 10

So you can see a performance increase between 4 and 13, depending on the
CPU and the function.

The address layout of the VDSO has changed, because there is no fixed
address space available on a x86 32 bit kernel, despite the name. Because
someone decided to add an offset to the __FIXADDR_TOP for virtualization.

Also the IA32 Emulation uses the whole 4 GB address space, so there is no
fixed address available.

This was the reason not depend on this kind of address and change the layout
of the VDSO. The VDSO for a 32 bit application has now three pages:

^ Higher Address
|
+----------------------------------------+
+ VDSO page (includes code) ro+x +
+----------------------------------------+
+ VVAR page (export kernel variables) ro +
+----------------------------------------+
+ HPET page (mapped registers) ro
+----------------------------------------+
|
^ Lower Address

The VDSO page for a 32 bit resided still on 0xffffe000, the the VVAR and
HPET page are mapped before.

In the non compat mode the VMA of the VDSO is now 3 pages for a 32 bit kernel.
So this decrease the available logical address room by 2 pages.

The patch is against kernel 3.14 (e7651b819e90da924991d727d3c007200a18670d)

Changelog:
25.11.2012 - first release and proof of concept for linux 3.4
11.12.2012 - Port to linux 3.7 and code cleanup
12.12.2012 - fixes suggested by Andy Lutomirski
- fixes suggested by John Stultz
- use call VDSO32_vsyscall instead of int 80
- code cleanup
17.12.2012 - support for IA32_EMULATION, this includes
- code cleanup
- include cleanup to fix compile warnings and errors
- move out seqcount from seqlock, enable use in VDSO
- map FIXMAP and HPET into the 32 bit address space
18.12.2012 - split into separate patches
30.01.2014 - revamp the code
- code clean up
- VDSO layout changed
- no fixed addresses
- port to 3.14
01.02.2014 - code cleanup
02.02.2014 - code cleanup
- split into more patches
- use HPET_COUNTER instead of hard coded value
- fix changelog to the right year ;-)
02.02.2014 - reverse the mapping, this make the new VDSO 32 bit support
full compatible.
03.02.2014 - code cleanup
- fix comment
- fix ABI break in vdso32.lds.S
04.02.2014 - revamp IA32 emulation support
- introduce VVAR macro
- rearranged vsyscall_gtod_data struture for IA32 emulation support
- code cleanup
05.02.2014 - revamp IA32 emulation support
- replace seqcount_t by an unsigned, to make the vsyscall_gtod_data
structure independed of kernel config and functions.
08.02.2014 - revamp IA32 emulation support
- replace all internal structures by fix size elements
10.02.2014 - code cleanup
- add commets
- revamp inline assembly
12.02.2014 - add conditional fixmap of vvar and hpet pages for 32 bit kernel
14.02.2014 - fix CONFIG_PARAVIRT_CLOCK, which is not supported in 32 bit VDSO
15.02.2014 - fix tsc
code cleanup
tested make ARCH=i386 allyesconfig and make allyesconfig
16.02.2014 - code cleanup
- fix all C=1 warnings, also some one not introduced by this patch
- hack to fix C=1 32 bit VDSO spinlock for a 64 bit kernel
- fix VDSO Makefile for newer gcc
tested for gcc 4.3.4 and 4.8.1
tested ARCH=i386 allyesconfig, defconfig and allmodconfig
tested X86_64 allyesconfig, defconfig and allmodconfig


2014-02-16 21:52:23

by Stefani Seibold

[permalink] [raw]
Subject: [PATCH v18 06/10] cleanup __vdso_gettimeofday

This patch do a little cleanup for the __vdso_gettimeofday() function.

It kick out an unneeded ret local variable and makes the code faster if
only the timezone is needed.

Signed-off-by: Stefani Seibold <[email protected]>
---
arch/x86/vdso/vclock_gettime.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/x86/vdso/vclock_gettime.c b/arch/x86/vdso/vclock_gettime.c
index 743f277..09dae4a 100644
--- a/arch/x86/vdso/vclock_gettime.c
+++ b/arch/x86/vdso/vclock_gettime.c
@@ -259,13 +259,12 @@ int clock_gettime(clockid_t, struct timespec *)

notrace int __vdso_gettimeofday(struct timeval *tv, struct timezone *tz)
{
- long ret = VCLOCK_NONE;
-
if (likely(tv != NULL)) {
BUILD_BUG_ON(offsetof(struct timeval, tv_usec) !=
offsetof(struct timespec, tv_nsec) ||
sizeof(*tv) != sizeof(struct timespec));
- ret = do_realtime((struct timespec *)tv);
+ if (unlikely(do_realtime((struct timespec *)tv) == VCLOCK_NONE))
+ return vdso_fallback_gtod(tv, tz);
tv->tv_usec /= 1000;
}
if (unlikely(tz != NULL)) {
@@ -274,8 +273,6 @@ notrace int __vdso_gettimeofday(struct timeval *tv, struct timezone *tz)
tz->tz_dsttime = gtod->sys_tz.tz_dsttime;
}

- if (ret == VCLOCK_NONE)
- return vdso_fallback_gtod(tv, tz);
return 0;
}
int gettimeofday(struct timeval *, struct timezone *)
--
1.8.5.5

2014-02-16 21:52:21

by Stefani Seibold

[permalink] [raw]
Subject: [PATCH v18 02/10] Add new func _install_special_mapping() to mmap.c

The _install_special_mapping() is the new base function for
install_special_mapping(). This function will return a pointer of the
created VMA or a error code in an ERR_PTR()

This new function will be needed by the for the vdso 32 bit support to map the
additonal vvar and hpet pages into the 32 bit address space. This will be done
with io_remap_pfn_range() and remap_pfn_range, which requieres a vm_area_struct.

Signed-off-by: Stefani Seibold <[email protected]>
---
include/linux/mm.h | 3 +++
mm/mmap.c | 20 ++++++++++++++++----
2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index f28f46e..55342aa 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1740,6 +1740,9 @@ extern void set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file);
extern struct file *get_mm_exe_file(struct mm_struct *mm);

extern int may_expand_vm(struct mm_struct *mm, unsigned long npages);
+extern struct vm_area_struct *_install_special_mapping(struct mm_struct *mm,
+ unsigned long addr, unsigned long len,
+ unsigned long flags, struct page **pages);
extern int install_special_mapping(struct mm_struct *mm,
unsigned long addr, unsigned long len,
unsigned long flags, struct page **pages);
diff --git a/mm/mmap.c b/mm/mmap.c
index 20ff0c3..81ba54f 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2918,7 +2918,7 @@ static const struct vm_operations_struct special_mapping_vmops = {
* The array pointer and the pages it points to are assumed to stay alive
* for as long as this mapping might exist.
*/
-int install_special_mapping(struct mm_struct *mm,
+struct vm_area_struct *_install_special_mapping(struct mm_struct *mm,
unsigned long addr, unsigned long len,
unsigned long vm_flags, struct page **pages)
{
@@ -2927,7 +2927,7 @@ int install_special_mapping(struct mm_struct *mm,

vma = kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL);
if (unlikely(vma == NULL))
- return -ENOMEM;
+ return ERR_PTR(-ENOMEM);

INIT_LIST_HEAD(&vma->anon_vma_chain);
vma->vm_mm = mm;
@@ -2948,11 +2948,23 @@ int install_special_mapping(struct mm_struct *mm,

perf_event_mmap(vma);

- return 0;
+ return vma;

out:
kmem_cache_free(vm_area_cachep, vma);
- return ret;
+ return ERR_PTR(ret);
+}
+
+int install_special_mapping(struct mm_struct *mm,
+ unsigned long addr, unsigned long len,
+ unsigned long vm_flags, struct page **pages)
+{
+ struct vm_area_struct *vma = _install_special_mapping(mm,
+ addr, len, vm_flags, pages);
+
+ if (IS_ERR(vma))
+ return PTR_ERR(vma);
+ return 0;
}

static DEFINE_MUTEX(mm_all_locks_mutex);
--
1.8.5.5

2014-02-16 21:52:47

by Stefani Seibold

[permalink] [raw]
Subject: [PATCH v18 01/10] Make vsyscall_gtod_data handling x86 generic

This patch move the vsyscall_gtod_data handling out of vsyscall_64.c
into an additonal file vsyscall_gtod.c to make the functionality
available for x86 32 bit kernel.

It also adds a new vsyscall_32.c which setup the VVAR page.

Signed-off-by: Stefani Seibold <[email protected]>
---
arch/x86/Kconfig | 4 +--
arch/x86/include/asm/clocksource.h | 4 ---
arch/x86/include/asm/fixmap.h | 2 ++
arch/x86/include/asm/vvar.h | 12 ++++++--
arch/x86/kernel/Makefile | 3 +-
arch/x86/kernel/hpet.c | 4 ---
arch/x86/kernel/setup.c | 2 --
arch/x86/kernel/tsc.c | 2 --
arch/x86/kernel/vmlinux.lds.S | 3 --
arch/x86/kernel/vsyscall_32.c | 21 +++++++++++++
arch/x86/kernel/vsyscall_64.c | 45 ----------------------------
arch/x86/kernel/vsyscall_gtod.c | 60 ++++++++++++++++++++++++++++++++++++++
arch/x86/tools/relocs.c | 2 +-
13 files changed, 97 insertions(+), 67 deletions(-)
create mode 100644 arch/x86/kernel/vsyscall_32.c
create mode 100644 arch/x86/kernel/vsyscall_gtod.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 0af5250..0da3b39 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -107,9 +107,9 @@ config X86
select HAVE_ARCH_SOFT_DIRTY
select CLOCKSOURCE_WATCHDOG
select GENERIC_CLOCKEVENTS
- select ARCH_CLOCKSOURCE_DATA if X86_64
+ select ARCH_CLOCKSOURCE_DATA
select GENERIC_CLOCKEVENTS_BROADCAST if X86_64 || (X86_32 && X86_LOCAL_APIC)
- select GENERIC_TIME_VSYSCALL if X86_64
+ select GENERIC_TIME_VSYSCALL
select KTIME_SCALAR if X86_32
select GENERIC_STRNCPY_FROM_USER
select GENERIC_STRNLEN_USER
diff --git a/arch/x86/include/asm/clocksource.h b/arch/x86/include/asm/clocksource.h
index 16a57f4..eda81dc 100644
--- a/arch/x86/include/asm/clocksource.h
+++ b/arch/x86/include/asm/clocksource.h
@@ -3,8 +3,6 @@
#ifndef _ASM_X86_CLOCKSOURCE_H
#define _ASM_X86_CLOCKSOURCE_H

-#ifdef CONFIG_X86_64
-
#define VCLOCK_NONE 0 /* No vDSO clock available. */
#define VCLOCK_TSC 1 /* vDSO should use vread_tsc. */
#define VCLOCK_HPET 2 /* vDSO should use vread_hpet. */
@@ -14,6 +12,4 @@ struct arch_clocksource_data {
int vclock_mode;
};

-#endif /* CONFIG_X86_64 */
-
#endif /* _ASM_X86_CLOCKSOURCE_H */
diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index 7252cd3..094d0cc 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -75,6 +75,8 @@ enum fixed_addresses {
#ifdef CONFIG_X86_32
FIX_HOLE,
FIX_VDSO,
+ VVAR_PAGE,
+ VSYSCALL_HPET,
#else
VSYSCALL_LAST_PAGE,
VSYSCALL_FIRST_PAGE = VSYSCALL_LAST_PAGE
diff --git a/arch/x86/include/asm/vvar.h b/arch/x86/include/asm/vvar.h
index d76ac40..0a534ea 100644
--- a/arch/x86/include/asm/vvar.h
+++ b/arch/x86/include/asm/vvar.h
@@ -16,9 +16,6 @@
* you mess up, the linker will catch it.)
*/

-/* Base address of vvars. This is not ABI. */
-#define VVAR_ADDRESS (-10*1024*1024 - 4096)
-
#if defined(__VVAR_KERNEL_LDS)

/* The kernel linker script defines its own magic to put vvars in the
@@ -29,6 +26,15 @@

#else

+extern char __vvar_page;
+
+/* Base address of vvars. This is not ABI. */
+#ifdef CONFIG_X86_64
+#define VVAR_ADDRESS (-10*1024*1024 - 4096)
+#else
+#define VVAR_ADDRESS (&__vvar_page)
+#endif
+
#define DECLARE_VVAR(offset, type, name) \
static type const * const vvaraddr_ ## name = \
(void *)(VVAR_ADDRESS + (offset));
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index cb648c8..3282eda 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -26,7 +26,8 @@ obj-$(CONFIG_IRQ_WORK) += irq_work.o
obj-y += probe_roms.o
obj-$(CONFIG_X86_32) += i386_ksyms_32.o
obj-$(CONFIG_X86_64) += sys_x86_64.o x8664_ksyms_64.o
-obj-y += syscall_$(BITS).o
+obj-y += syscall_$(BITS).o vsyscall_gtod.o
+obj-$(CONFIG_X86_32) += vsyscall_32.o
obj-$(CONFIG_X86_64) += vsyscall_64.o
obj-$(CONFIG_X86_64) += vsyscall_emu_64.o
obj-$(CONFIG_SYSFS) += ksysfs.o
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index da85a8e..54263f0 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -74,9 +74,7 @@ static inline void hpet_writel(unsigned int d, unsigned int a)
static inline void hpet_set_mapping(void)
{
hpet_virt_address = ioremap_nocache(hpet_address, HPET_MMAP_SIZE);
-#ifdef CONFIG_X86_64
__set_fixmap(VSYSCALL_HPET, hpet_address, PAGE_KERNEL_VVAR_NOCACHE);
-#endif
}

static inline void hpet_clear_mapping(void)
@@ -752,9 +750,7 @@ static struct clocksource clocksource_hpet = {
.mask = HPET_MASK,
.flags = CLOCK_SOURCE_IS_CONTINUOUS,
.resume = hpet_resume_counter,
-#ifdef CONFIG_X86_64
.archdata = { .vclock_mode = VCLOCK_HPET },
-#endif
};

static int hpet_clocksource_register(void)
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 06853e6..56ff330 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1182,9 +1182,7 @@ void __init setup_arch(char **cmdline_p)

tboot_probe();

-#ifdef CONFIG_X86_64
map_vsyscall();
-#endif

generic_apic_probe();

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index acb3b60..a99a490 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -988,9 +988,7 @@ static struct clocksource clocksource_tsc = {
.mask = CLOCKSOURCE_MASK(64),
.flags = CLOCK_SOURCE_IS_CONTINUOUS |
CLOCK_SOURCE_MUST_VERIFY,
-#ifdef CONFIG_X86_64
.archdata = { .vclock_mode = VCLOCK_TSC },
-#endif
};

void mark_tsc_unstable(char *reason)
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index da6b35a..1d4897b 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -147,7 +147,6 @@ SECTIONS
_edata = .;
} :data

-#ifdef CONFIG_X86_64

. = ALIGN(PAGE_SIZE);
__vvar_page = .;
@@ -169,8 +168,6 @@ SECTIONS

. = ALIGN(__vvar_page + PAGE_SIZE, PAGE_SIZE);

-#endif /* CONFIG_X86_64 */
-
/* Init code and data - will be freed after init */
. = ALIGN(PAGE_SIZE);
.init.begin : AT(ADDR(.init.begin) - LOAD_OFFSET) {
diff --git a/arch/x86/kernel/vsyscall_32.c b/arch/x86/kernel/vsyscall_32.c
new file mode 100644
index 0000000..0b72db7
--- /dev/null
+++ b/arch/x86/kernel/vsyscall_32.c
@@ -0,0 +1,21 @@
+/*
+ * Copyright (C) 2001 Andrea Arcangeli <[email protected]> SuSE
+ * Copyright 2003 Andi Kleen, SuSE Labs.
+ *
+ * Modified for x86 32 bit arch by Stefani Seibold <[email protected]>
+ *
+ * Thanks to [email protected] for some useful hint.
+ * Special thanks to Ingo Molnar for his early experience with
+ * a different vsyscall implementation for Linux/IA32 and for the name.
+ *
+ */
+
+#include <asm/vsyscall.h>
+#include <asm/pgtable.h>
+#include <asm/fixmap.h>
+
+void __init map_vsyscall(void)
+{
+ __set_fixmap(VVAR_PAGE, __pa_symbol(&__vvar_page), PAGE_KERNEL_VVAR);
+}
+
diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index 1f96f93..9ea2876 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -47,14 +47,12 @@
#include <asm/segment.h>
#include <asm/desc.h>
#include <asm/topology.h>
-#include <asm/vgtod.h>
#include <asm/traps.h>

#define CREATE_TRACE_POINTS
#include "vsyscall_trace.h"

DEFINE_VVAR(int, vgetcpu_mode);
-DEFINE_VVAR(struct vsyscall_gtod_data, vsyscall_gtod_data);

static enum { EMULATE, NATIVE, NONE } vsyscall_mode = EMULATE;

@@ -77,48 +75,6 @@ static int __init vsyscall_setup(char *str)
}
early_param("vsyscall", vsyscall_setup);

-void update_vsyscall_tz(void)
-{
- vsyscall_gtod_data.sys_tz = sys_tz;
-}
-
-void update_vsyscall(struct timekeeper *tk)
-{
- struct vsyscall_gtod_data *vdata = &vsyscall_gtod_data;
-
- write_seqcount_begin(&vdata->seq);
-
- /* copy vsyscall data */
- vdata->clock.vclock_mode = tk->clock->archdata.vclock_mode;
- vdata->clock.cycle_last = tk->clock->cycle_last;
- vdata->clock.mask = tk->clock->mask;
- vdata->clock.mult = tk->mult;
- vdata->clock.shift = tk->shift;
-
- vdata->wall_time_sec = tk->xtime_sec;
- vdata->wall_time_snsec = tk->xtime_nsec;
-
- vdata->monotonic_time_sec = tk->xtime_sec
- + tk->wall_to_monotonic.tv_sec;
- vdata->monotonic_time_snsec = tk->xtime_nsec
- + (tk->wall_to_monotonic.tv_nsec
- << tk->shift);
- while (vdata->monotonic_time_snsec >=
- (((u64)NSEC_PER_SEC) << tk->shift)) {
- vdata->monotonic_time_snsec -=
- ((u64)NSEC_PER_SEC) << tk->shift;
- vdata->monotonic_time_sec++;
- }
-
- vdata->wall_time_coarse.tv_sec = tk->xtime_sec;
- vdata->wall_time_coarse.tv_nsec = (long)(tk->xtime_nsec >> tk->shift);
-
- vdata->monotonic_time_coarse = timespec_add(vdata->wall_time_coarse,
- tk->wall_to_monotonic);
-
- write_seqcount_end(&vdata->seq);
-}
-
static void warn_bad_vsyscall(const char *level, struct pt_regs *regs,
const char *message)
{
@@ -374,7 +330,6 @@ void __init map_vsyscall(void)
{
extern char __vsyscall_page;
unsigned long physaddr_vsyscall = __pa_symbol(&__vsyscall_page);
- extern char __vvar_page;
unsigned long physaddr_vvar_page = __pa_symbol(&__vvar_page);

__set_fixmap(VSYSCALL_FIRST_PAGE, physaddr_vsyscall,
diff --git a/arch/x86/kernel/vsyscall_gtod.c b/arch/x86/kernel/vsyscall_gtod.c
new file mode 100644
index 0000000..91862a4
--- /dev/null
+++ b/arch/x86/kernel/vsyscall_gtod.c
@@ -0,0 +1,60 @@
+/*
+ * Copyright (C) 2001 Andrea Arcangeli <[email protected]> SuSE
+ * Copyright 2003 Andi Kleen, SuSE Labs.
+ *
+ * Modified for x86 32 bit architecture by
+ * Stefani Seibold <[email protected]>
+ *
+ * Thanks to [email protected] for some useful hint.
+ * Special thanks to Ingo Molnar for his early experience with
+ * a different vsyscall implementation for Linux/IA32 and for the name.
+ *
+ */
+
+#include <linux/timekeeper_internal.h>
+#include <asm/vgtod.h>
+
+DEFINE_VVAR(struct vsyscall_gtod_data, vsyscall_gtod_data);
+
+void update_vsyscall_tz(void)
+{
+ vsyscall_gtod_data.sys_tz = sys_tz;
+}
+
+void update_vsyscall(struct timekeeper *tk)
+{
+ struct vsyscall_gtod_data *vdata = &vsyscall_gtod_data;
+
+ write_seqcount_begin(&vdata->seq);
+
+ /* copy vsyscall data */
+ vdata->clock.vclock_mode = tk->clock->archdata.vclock_mode;
+ vdata->clock.cycle_last = tk->clock->cycle_last;
+ vdata->clock.mask = tk->clock->mask;
+ vdata->clock.mult = tk->mult;
+ vdata->clock.shift = tk->shift;
+
+ vdata->wall_time_sec = tk->xtime_sec;
+ vdata->wall_time_snsec = tk->xtime_nsec;
+
+ vdata->monotonic_time_sec = tk->xtime_sec
+ + tk->wall_to_monotonic.tv_sec;
+ vdata->monotonic_time_snsec = tk->xtime_nsec
+ + (tk->wall_to_monotonic.tv_nsec
+ << tk->shift);
+ while (vdata->monotonic_time_snsec >=
+ (((u64)NSEC_PER_SEC) << tk->shift)) {
+ vdata->monotonic_time_snsec -=
+ ((u64)NSEC_PER_SEC) << tk->shift;
+ vdata->monotonic_time_sec++;
+ }
+
+ vdata->wall_time_coarse.tv_sec = tk->xtime_sec;
+ vdata->wall_time_coarse.tv_nsec = (long)(tk->xtime_nsec >> tk->shift);
+
+ vdata->monotonic_time_coarse = timespec_add(vdata->wall_time_coarse,
+ tk->wall_to_monotonic);
+
+ write_seqcount_end(&vdata->seq);
+}
+
diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index cfbdbdb..bbb1d22 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -69,8 +69,8 @@ static const char * const sym_regex_kernel[S_NSYMTYPES] = {
"__per_cpu_load|"
"init_per_cpu__.*|"
"__end_rodata_hpage_align|"
- "__vvar_page|"
#endif
+ "__vvar_page|"
"_end)$"
};

--
1.8.5.5

2014-02-16 21:52:46

by Stefani Seibold

[permalink] [raw]
Subject: [PATCH v18 08/10] Add 32 bit VDSO time support for 32 bit kernel

This patch add the time support for 32 bit a VDSO to a 32 bit kernel.

For 32 bit programs running on a 32 bit kernel, the same mechanism is
used as for 64 bit programs running on a 64 bit kernel.

Signed-off-by: Stefani Seibold <[email protected]>
---
arch/x86/include/asm/vdso.h | 3 ++
arch/x86/include/asm/vdso32.h | 11 ++++++
arch/x86/vdso/Makefile | 8 ++++
arch/x86/vdso/vclock_gettime.c | 74 ++++++++++++++++++++++++++++++++---
arch/x86/vdso/vdso-layout.lds.S | 22 +++++++++++
arch/x86/vdso/vdso32-setup.c | 53 ++++++++++++++++++++++---
arch/x86/vdso/vdso32/vclock_gettime.c | 36 +++++++++++++++++
arch/x86/vdso/vdso32/vdso32.lds.S | 9 +++++
8 files changed, 204 insertions(+), 12 deletions(-)
create mode 100644 arch/x86/include/asm/vdso32.h
create mode 100644 arch/x86/vdso/vdso32/vclock_gettime.c

diff --git a/arch/x86/include/asm/vdso.h b/arch/x86/include/asm/vdso.h
index fddb53d..fe3cef9 100644
--- a/arch/x86/include/asm/vdso.h
+++ b/arch/x86/include/asm/vdso.h
@@ -2,6 +2,9 @@
#define _ASM_X86_VDSO_H

#if defined CONFIG_X86_32 || defined CONFIG_COMPAT
+
+#include <asm/vdso32.h>
+
extern const char VDSO32_PRELINK[];

/*
diff --git a/arch/x86/include/asm/vdso32.h b/arch/x86/include/asm/vdso32.h
new file mode 100644
index 0000000..7efb701
--- /dev/null
+++ b/arch/x86/include/asm/vdso32.h
@@ -0,0 +1,11 @@
+#ifndef _ASM_X86_VDSO32_H
+#define _ASM_X86_VDSO32_H
+
+#define VDSO_BASE_PAGE 0
+#define VDSO_VVAR_PAGE 1
+#define VDSO_HPET_PAGE 2
+#define VDSO_PAGES 3
+#define VDSO_PREV_PAGES 2
+#define VDSO_OFFSET(x) ((x) * PAGE_SIZE)
+
+#endif
diff --git a/arch/x86/vdso/Makefile b/arch/x86/vdso/Makefile
index fd14be1..92daaa6 100644
--- a/arch/x86/vdso/Makefile
+++ b/arch/x86/vdso/Makefile
@@ -145,8 +145,16 @@ KBUILD_AFLAGS_32 := $(filter-out -m64,$(KBUILD_AFLAGS))
$(vdso32-images:%=$(obj)/%.dbg): KBUILD_AFLAGS = $(KBUILD_AFLAGS_32)
$(vdso32-images:%=$(obj)/%.dbg): asflags-$(CONFIG_X86_64) += -m32

+KBUILD_CFLAGS_32 := $(filter-out -m64,$(KBUILD_CFLAGS))
+KBUILD_CFLAGS_32 := $(filter-out -mcmodel=kernel,$(KBUILD_CFLAGS_32))
+KBUILD_CFLAGS_32 := $(filter-out -fno-pic,$(KBUILD_CFLAGS_32))
+KBUILD_CFLAGS_32 := $(filter-out -mfentry,$(KBUILD_CFLAGS_32))
+KBUILD_CFLAGS_32 += -m32 -msoft-float -mregparm=3 -freg-struct-return -fpic
+$(vdso32-images:%=$(obj)/%.dbg): KBUILD_CFLAGS = $(KBUILD_CFLAGS_32)
+
$(vdso32-images:%=$(obj)/%.dbg): $(obj)/vdso32-%.so.dbg: FORCE \
$(obj)/vdso32/vdso32.lds \
+ $(obj)/vdso32/vclock_gettime.o \
$(obj)/vdso32/note.o \
$(obj)/vdso32/%.o
$(call if_changed,vdso)
diff --git a/arch/x86/vdso/vclock_gettime.c b/arch/x86/vdso/vclock_gettime.c
index 09dae4a..fcbc974 100644
--- a/arch/x86/vdso/vclock_gettime.c
+++ b/arch/x86/vdso/vclock_gettime.c
@@ -4,6 +4,9 @@
*
* Fast user context implementation of clock_gettime, gettimeofday, and time.
*
+ * 32 Bit compat layer by Stefani Seibold <[email protected]>
+ * sponsored by Rohde & Schwarz GmbH & Co. KG Munich/Germany
+ *
* The code should have no internal unresolved relocations.
* Check with readelf after changing.
*/
@@ -12,13 +15,11 @@
#define DISABLE_BRANCH_PROFILING

#include <linux/kernel.h>
-#include <linux/posix-timers.h>
-#include <linux/time.h>
+#include <uapi/linux/time.h>
#include <linux/string.h>
#include <asm/vsyscall.h>
#include <asm/fixmap.h>
#include <asm/vgtod.h>
-#include <asm/timex.h>
#include <asm/hpet.h>
#include <asm/unistd.h>
#include <asm/io.h>
@@ -26,6 +27,12 @@

#define gtod (&VVAR(vsyscall_gtod_data))

+extern int __vdso_clock_gettime(clockid_t clock, struct timespec *ts);
+extern int __vdso_gettimeofday(struct timeval *tv, struct timezone *tz);
+extern time_t __vdso_time(time_t *t);
+
+#ifndef BUILD_VDSO32
+
static notrace cycle_t vread_hpet(void)
{
return readl((const void __iomem *)fix_to_virt(VSYSCALL_HPET) + HPET_COUNTER);
@@ -118,6 +125,59 @@ static notrace cycle_t vread_pvclock(int *mode)
}
#endif

+#else
+
+extern u8 hpet_page
+ __attribute__((visibility("hidden")));
+
+#ifdef CONFIG_HPET_TIMER
+static notrace cycle_t vread_hpet(void)
+{
+ return readl((const void __iomem *)(&hpet_page + HPET_COUNTER));
+}
+#endif
+
+notrace static long vdso_fallback_gettime(long clock, struct timespec *ts)
+{
+ long ret;
+
+ asm(
+ "mov %%ebx, %%edx \n"
+ "mov %2, %%ebx \n"
+ "call VDSO32_vsyscall \n"
+ "mov %%edx, %%ebx \n"
+ : "=a" (ret)
+ : "0" (__NR_clock_gettime), "g" (clock), "c" (ts)
+ : "memory", "edx");
+ return ret;
+}
+
+notrace static long vdso_fallback_gtod(struct timeval *tv, struct timezone *tz)
+{
+ long ret;
+
+ asm(
+ "mov %%ebx, %%edx \n"
+ "mov %2, %%ebx \n"
+ "call VDSO32_vsyscall \n"
+ "mov %%edx, %%ebx \n"
+ : "=a" (ret)
+ : "0" (__NR_gettimeofday), "g" (tv), "c" (tz)
+ : "memory", "edx");
+ return ret;
+}
+
+#ifdef CONFIG_PARAVIRT_CLOCK
+
+static notrace cycle_t vread_pvclock(int *mode)
+{
+ *mode = VCLOCK_NONE;
+ return 0;
+}
+#endif
+
+#endif
+
notrace static cycle_t vread_tsc(void)
{
cycle_t ret;
@@ -131,7 +191,7 @@ notrace static cycle_t vread_tsc(void)
* but no one has ever seen it happen.
*/
rdtsc_barrier();
- ret = (cycle_t)vget_cycles();
+ ret = (cycle_t)__native_read_tsc();

last = gtod->clock.cycle_last;

@@ -152,12 +212,14 @@ notrace static cycle_t vread_tsc(void)

notrace static inline u64 vgetsns(int *mode)
{
- long v;
+ u64 v;
cycles_t cycles;
if (gtod->clock.vclock_mode == VCLOCK_TSC)
cycles = vread_tsc();
+#ifdef CONFIG_HPET_TIMER
else if (gtod->clock.vclock_mode == VCLOCK_HPET)
cycles = vread_hpet();
+#endif
#ifdef CONFIG_PARAVIRT_CLOCK
else if (gtod->clock.vclock_mode == VCLOCK_PVCLOCK)
cycles = vread_pvclock(mode);
@@ -284,7 +346,7 @@ int gettimeofday(struct timeval *, struct timezone *)
*/
notrace time_t __vdso_time(time_t *t)
{
- /* This is atomic on x86_64 so we don't need any locks. */
+ /* This is atomic on x86 so we don't need any locks. */
time_t result = ACCESS_ONCE(gtod->wall_time_sec);

if (t)
diff --git a/arch/x86/vdso/vdso-layout.lds.S b/arch/x86/vdso/vdso-layout.lds.S
index 634a2cf..1261437 100644
--- a/arch/x86/vdso/vdso-layout.lds.S
+++ b/arch/x86/vdso/vdso-layout.lds.S
@@ -6,6 +6,24 @@

SECTIONS
{
+#ifdef BUILD_VDSO32
+#include <asm/vdso32.h>
+
+ .hpet_sect : {
+ hpet_page = . - VDSO_OFFSET(VDSO_HPET_PAGE);
+ } :text :hpet_sect
+
+ .vvar_sect : {
+ vvar = . - VDSO_OFFSET(VDSO_VVAR_PAGE);
+
+ /* Place all vvars at the offsets in asm/vvar.h. */
+#define EMIT_VVAR(name, offset) vvar_ ## name = vvar + offset;
+#define __VVAR_KERNEL_LDS
+#include <asm/vvar.h>
+#undef __VVAR_KERNEL_LDS
+#undef EMIT_VVAR
+ } :text :vvar_sect
+#endif
. = VDSO_PRELINK + SIZEOF_HEADERS;

.hash : { *(.hash) } :text
@@ -61,4 +79,8 @@ PHDRS
dynamic PT_DYNAMIC FLAGS(4); /* PF_R */
note PT_NOTE FLAGS(4); /* PF_R */
eh_frame_hdr PT_GNU_EH_FRAME;
+#ifdef BUILD_VDSO32
+ vvar_sect PT_NULL FLAGS(4); /* PF_R */
+ hpet_sect PT_NULL FLAGS(4); /* PF_R */
+#endif
}
diff --git a/arch/x86/vdso/vdso32-setup.c b/arch/x86/vdso/vdso32-setup.c
index d6bfb87..9b57770 100644
--- a/arch/x86/vdso/vdso32-setup.c
+++ b/arch/x86/vdso/vdso32-setup.c
@@ -25,6 +25,9 @@
#include <asm/tlbflush.h>
#include <asm/vdso.h>
#include <asm/proto.h>
+#include <asm/fixmap.h>
+#include <asm/hpet.h>
+#include <asm/vvar.h>

enum {
VDSO_DISABLED = 0,
@@ -193,7 +196,7 @@ static __init void relocate_vdso(Elf32_Ehdr *ehdr)
}
}

-static struct page *vdso32_pages[1];
+static struct page *vdso32_pages[VDSO_PAGES];

#ifdef CONFIG_X86_64

@@ -310,6 +313,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
unsigned long addr;
int ret = 0;
bool compat;
+ struct vm_area_struct *vma;

#ifdef CONFIG_X86_X32_ABI
if (test_thread_flag(TIF_X32))
@@ -330,11 +334,13 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
if (compat)
addr = VDSO_HIGH_BASE;
else {
- addr = get_unmapped_area(NULL, 0, PAGE_SIZE, 0, 0);
+ addr = get_unmapped_area(NULL, 0, VDSO_OFFSET(VDSO_PAGES), 0, 0);
if (IS_ERR_VALUE(addr)) {
ret = addr;
goto up_fail;
}
+
+ addr += VDSO_OFFSET(VDSO_PREV_PAGES);
}

current->mm->context.vdso = (void *)addr;
@@ -343,13 +349,48 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
/*
* MAYWRITE to allow gdb to COW and set breakpoints
*/
- ret = install_special_mapping(mm, addr, PAGE_SIZE,
- VM_READ|VM_EXEC|
- VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
- vdso32_pages);
+ ret = install_special_mapping(mm,
+ addr,
+ VDSO_OFFSET(VDSO_PAGES - VDSO_PREV_PAGES),
+ VM_READ|VM_EXEC|
+ VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
+ vdso32_pages);

if (ret)
goto up_fail;
+
+ vma = _install_special_mapping(mm,
+ addr - VDSO_OFFSET(VDSO_PREV_PAGES),
+ VDSO_OFFSET(VDSO_PREV_PAGES),
+ VM_READ,
+ NULL);
+
+ if (IS_ERR(vma)) {
+ ret = PTR_ERR(vma);
+ goto up_fail;
+ }
+
+ ret = remap_pfn_range(vma,
+ addr - VDSO_OFFSET(VDSO_VVAR_PAGE),
+ __pa_symbol(&__vvar_page) >> PAGE_SHIFT,
+ PAGE_SIZE,
+ PAGE_READONLY);
+
+ if (ret)
+ goto up_fail;
+
+#ifdef CONFIG_HPET_TIMER
+ if (hpet_address) {
+ ret = io_remap_pfn_range(vma,
+ addr - VDSO_OFFSET(VDSO_HPET_PAGE),
+ hpet_address >> PAGE_SHIFT,
+ PAGE_SIZE,
+ pgprot_noncached(PAGE_READONLY));
+
+ if (ret)
+ goto up_fail;
+ }
+#endif
}

current_thread_info()->sysenter_return =
diff --git a/arch/x86/vdso/vdso32/vclock_gettime.c b/arch/x86/vdso/vdso32/vclock_gettime.c
new file mode 100644
index 0000000..ffdbdb1
--- /dev/null
+++ b/arch/x86/vdso/vdso32/vclock_gettime.c
@@ -0,0 +1,36 @@
+#define BUILD_VDSO32
+
+#ifdef CONFIG_X86_64
+
+/*
+ * Due the -m32 compilation, there will be a lot of
+ * "warning: integer constant is too large for 'unsigned long' type",
+ * because an unsigned long is only 32 bit.
+ */
+
+/*
+ * Prevents the include of arch/x86/include/asm/page.h, which will generate
+ * a lot of warnings.
+ */
+#define _ASM_X86_PAGE_H
+
+/*
+ * The unneeded inline function phys_to_virt() in arch/x86/include/asm/io.h
+ * depends on the __va(), which comes from arch/x86/include/asm/page.h.
+ * So add a dummy for this.
+ *
+ * It is save, since this functions not used in arch/x86/vdso/vclock_gettime.c
+ */
+#define __va(x) 0
+
+/*
+ * The define of CONFIG_ILLEGAL_POINTER_VALUE is also to prevent the
+ * "warning: integer constant is too large..."
+ */
+#undef CONFIG_ILLEGAL_POINTER_VALUE
+#define CONFIG_ILLEGAL_POINTER_VALUE 0
+
+#endif
+
+#include "../vclock_gettime.c"
+
diff --git a/arch/x86/vdso/vdso32/vdso32.lds.S b/arch/x86/vdso/vdso32/vdso32.lds.S
index 976124b..bc8bf6d 100644
--- a/arch/x86/vdso/vdso32/vdso32.lds.S
+++ b/arch/x86/vdso/vdso32/vdso32.lds.S
@@ -8,6 +8,9 @@
* values visible using the asm-x86/vdso.h macros from the kernel proper.
*/

+#include <asm/page.h>
+
+#define BUILD_VDSO32
#define VDSO_PRELINK 0
#include "../vdso-layout.lds.S"

@@ -24,6 +27,9 @@ VERSION
__kernel_vsyscall;
__kernel_sigreturn;
__kernel_rt_sigreturn;
+ __vdso_clock_gettime;
+ __vdso_gettimeofday;
+ __vdso_time;
local: *;
};
}
@@ -35,3 +41,6 @@ VDSO32_PRELINK = VDSO_PRELINK;
VDSO32_vsyscall = __kernel_vsyscall;
VDSO32_sigreturn = __kernel_sigreturn;
VDSO32_rt_sigreturn = __kernel_rt_sigreturn;
+VDSO32_clock_gettime = clock_gettime;
+VDSO32_gettimeofday = gettimeofday;
+VDSO32_time = time;
--
1.8.5.5

2014-02-16 21:52:19

by Stefani Seibold

[permalink] [raw]
Subject: [PATCH v18 07/10] introduce VVAR marco for vdso32

This patch revamp the vvar.h for introduce the VVAR macro for vdso32.

Signed-off-by: Stefani Seibold <[email protected]>
---
arch/x86/include/asm/vvar.h | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/vvar.h b/arch/x86/include/asm/vvar.h
index 0a534ea..52c79ff 100644
--- a/arch/x86/include/asm/vvar.h
+++ b/arch/x86/include/asm/vvar.h
@@ -26,6 +26,15 @@

#else

+#ifdef BUILD_VDSO32
+
+#define DECLARE_VVAR(offset, type, name) \
+ extern type vvar_ ## name __attribute__((visibility("hidden")));
+
+#define VVAR(name) (vvar_ ## name)
+
+#else
+
extern char __vvar_page;

/* Base address of vvars. This is not ABI. */
@@ -39,12 +48,13 @@ extern char __vvar_page;
static type const * const vvaraddr_ ## name = \
(void *)(VVAR_ADDRESS + (offset));

+#define VVAR(name) (*vvaraddr_ ## name)
+#endif
+
#define DEFINE_VVAR(type, name) \
type name \
__attribute__((section(".vvar_" #name), aligned(16))) __visible

-#define VVAR(name) (*vvaraddr_ ## name)
-
#endif

/* DECLARE_VVAR(offset, type, name) */
--
1.8.5.5

2014-02-16 21:53:26

by Stefani Seibold

[permalink] [raw]
Subject: [PATCH v18 03/10] revamp vclock_gettime.c

This intermediate patch revamps the vclock_gettime.c by moving some functions
around. It is only for spliting purpose, to make whole the 32 bit vdso timer
patch easier to review.

Signed-off-by: Stefani Seibold <[email protected]>
---
arch/x86/vdso/vclock_gettime.c | 85 +++++++++++++++++++++---------------------
1 file changed, 42 insertions(+), 43 deletions(-)

diff --git a/arch/x86/vdso/vclock_gettime.c b/arch/x86/vdso/vclock_gettime.c
index eb5d7a5..bbc8065 100644
--- a/arch/x86/vdso/vclock_gettime.c
+++ b/arch/x86/vdso/vclock_gettime.c
@@ -26,41 +26,26 @@

#define gtod (&VVAR(vsyscall_gtod_data))

-notrace static cycle_t vread_tsc(void)
+static notrace cycle_t vread_hpet(void)
{
- cycle_t ret;
- u64 last;
-
- /*
- * Empirically, a fence (of type that depends on the CPU)
- * before rdtsc is enough to ensure that rdtsc is ordered
- * with respect to loads. The various CPU manuals are unclear
- * as to whether rdtsc can be reordered with later loads,
- * but no one has ever seen it happen.
- */
- rdtsc_barrier();
- ret = (cycle_t)vget_cycles();
-
- last = VVAR(vsyscall_gtod_data).clock.cycle_last;
-
- if (likely(ret >= last))
- return ret;
+ return readl((const void __iomem *)fix_to_virt(VSYSCALL_HPET) + HPET_COUNTER);
+}

- /*
- * GCC likes to generate cmov here, but this branch is extremely
- * predictable (it's just a funciton of time and the likely is
- * very likely) and there's a data dependence, so force GCC
- * to generate a branch instead. I don't barrier() because
- * we don't actually need a barrier, and if this function
- * ever gets inlined it will generate worse code.
- */
- asm volatile ("");
- return last;
+notrace static long vdso_fallback_gettime(long clock, struct timespec *ts)
+{
+ long ret;
+ asm("syscall" : "=a" (ret) :
+ "0" (__NR_clock_gettime), "D" (clock), "S" (ts) : "memory");
+ return ret;
}

-static notrace cycle_t vread_hpet(void)
+notrace static long vdso_fallback_gtod(struct timeval *tv, struct timezone *tz)
{
- return readl((const void __iomem *)fix_to_virt(VSYSCALL_HPET) + HPET_COUNTER);
+ long ret;
+
+ asm("syscall" : "=a" (ret) :
+ "0" (__NR_gettimeofday), "D" (tv), "S" (tz) : "memory");
+ return ret;
}

#ifdef CONFIG_PARAVIRT_CLOCK
@@ -133,23 +118,37 @@ static notrace cycle_t vread_pvclock(int *mode)
}
#endif

-notrace static long vdso_fallback_gettime(long clock, struct timespec *ts)
+notrace static cycle_t vread_tsc(void)
{
- long ret;
- asm("syscall" : "=a" (ret) :
- "0" (__NR_clock_gettime),"D" (clock), "S" (ts) : "memory");
- return ret;
-}
+ cycle_t ret;
+ u64 last;

-notrace static long vdso_fallback_gtod(struct timeval *tv, struct timezone *tz)
-{
- long ret;
+ /*
+ * Empirically, a fence (of type that depends on the CPU)
+ * before rdtsc is enough to ensure that rdtsc is ordered
+ * with respect to loads. The various CPU manuals are unclear
+ * as to whether rdtsc can be reordered with later loads,
+ * but no one has ever seen it happen.
+ */
+ rdtsc_barrier();
+ ret = (cycle_t)vget_cycles();

- asm("syscall" : "=a" (ret) :
- "0" (__NR_gettimeofday), "D" (tv), "S" (tz) : "memory");
- return ret;
-}
+ last = VVAR(vsyscall_gtod_data).clock.cycle_last;

+ if (likely(ret >= last))
+ return ret;
+
+ /*
+ * GCC likes to generate cmov here, but this branch is extremely
+ * predictable (it's just a funciton of time and the likely is
+ * very likely) and there's a data dependence, so force GCC
+ * to generate a branch instead. I don't barrier() because
+ * we don't actually need a barrier, and if this function
+ * ever gets inlined it will generate worse code.
+ */
+ asm volatile ("");
+ return last;
+}

notrace static inline u64 vgetsns(int *mode)
{
--
1.8.5.5

2014-02-16 21:53:45

by Stefani Seibold

[permalink] [raw]
Subject: [PATCH v18 05/10] replace VVAR(vsyscall_gtod_data) by gtod macro

There a currently more than 30 users of the gtod macro, so replace the
last VVAR(vsyscall_gtod_data) by gtod macro.

Signed-off-by: Stefani Seibold <[email protected]>
---
arch/x86/vdso/vclock_gettime.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/vdso/vclock_gettime.c b/arch/x86/vdso/vclock_gettime.c
index fd074dd..743f277 100644
--- a/arch/x86/vdso/vclock_gettime.c
+++ b/arch/x86/vdso/vclock_gettime.c
@@ -109,7 +109,7 @@ static notrace cycle_t vread_pvclock(int *mode)
*mode = VCLOCK_NONE;

/* refer to tsc.c read_tsc() comment for rationale */
- last = VVAR(vsyscall_gtod_data).clock.cycle_last;
+ last = gtod->clock.cycle_last;

if (likely(ret >= last))
return ret;
@@ -133,7 +133,7 @@ notrace static cycle_t vread_tsc(void)
rdtsc_barrier();
ret = (cycle_t)vget_cycles();

- last = VVAR(vsyscall_gtod_data).clock.cycle_last;
+ last = gtod->clock.cycle_last;

if (likely(ret >= last))
return ret;
@@ -288,7 +288,7 @@ int gettimeofday(struct timeval *, struct timezone *)
notrace time_t __vdso_time(time_t *t)
{
/* This is atomic on x86_64 so we don't need any locks. */
- time_t result = ACCESS_ONCE(VVAR(vsyscall_gtod_data).wall_time_sec);
+ time_t result = ACCESS_ONCE(gtod->wall_time_sec);

if (t)
*t = result;
--
1.8.5.5

2014-02-16 21:54:25

by Stefani Seibold

[permalink] [raw]
Subject: [PATCH v18 09/10] Add 32 bit VDSO time support for 64 bit kernel

This patch add the VDSO time support for the IA32 Emulation Layer.

Due the nature of the kernel headers and the LP64 compiler where the
size of a long and a pointer differs against a 32 bit compiler, there
is some type hacking necessary for optimal performance.

The vsyscall_gtod_data struture must be a rearranged to serve 32- and
64-bit code access at the same time:

- The seqcount_t was replaced by an unsigned, this makes the
vsyscall_gtod_data intedepend of kernel configuration and internal functions.
- All kernel internal structures are replaced by fix size elements
which works for 32- and 64-bit access
- The inner struct clock was removed to pack the whole struct.

The "unsigned seq" would be handled by functions derivated from seqcount_t.

Signed-off-by: Stefani Seibold <[email protected]>
---
arch/x86/include/asm/vgtod.h | 69 ++++++++++++++++++++++++++++-------
arch/x86/include/asm/vvar.h | 5 +++
arch/x86/kernel/vsyscall_gtod.c | 33 +++++++++++------
arch/x86/vdso/vclock_gettime.c | 68 +++++++++++++++++-----------------
arch/x86/vdso/vdso32/vclock_gettime.c | 33 +++++++++++++++++
5 files changed, 148 insertions(+), 60 deletions(-)

diff --git a/arch/x86/include/asm/vgtod.h b/arch/x86/include/asm/vgtod.h
index 46e24d3..abb9e45 100644
--- a/arch/x86/include/asm/vgtod.h
+++ b/arch/x86/include/asm/vgtod.h
@@ -4,27 +4,70 @@
#include <asm/vsyscall.h>
#include <linux/clocksource.h>

+#ifdef CONFIG_X86_64
+typedef u64 gtod_long_t;
+#else
+typedef u32 gtod_long_t;
+#endif
+/*
+ * vsyscall_gtod_data will be accessed by 32 and 64 bit code at the same time
+ * so be carefull by modifying this structure.
+ */
struct vsyscall_gtod_data {
- seqcount_t seq;
+ unsigned seq;

- struct { /* extract of a clocksource struct */
- int vclock_mode;
- cycle_t cycle_last;
- cycle_t mask;
- u32 mult;
- u32 shift;
- } clock;
+ int vclock_mode;
+ cycle_t cycle_last;
+ cycle_t mask;
+ u32 mult;
+ u32 shift;

/* open coded 'struct timespec' */
- time_t wall_time_sec;
u64 wall_time_snsec;
+ gtod_long_t wall_time_sec;
+ gtod_long_t monotonic_time_sec;
u64 monotonic_time_snsec;
- time_t monotonic_time_sec;
+ gtod_long_t wall_time_coarse_sec;
+ gtod_long_t wall_time_coarse_nsec;
+ gtod_long_t monotonic_time_coarse_sec;
+ gtod_long_t monotonic_time_coarse_nsec;

- struct timezone sys_tz;
- struct timespec wall_time_coarse;
- struct timespec monotonic_time_coarse;
+ int tz_minuteswest;
+ int tz_dsttime;
};
extern struct vsyscall_gtod_data vsyscall_gtod_data;

+static inline unsigned gtod_read_begin(const struct vsyscall_gtod_data *s)
+{
+ unsigned ret;
+
+repeat:
+ ret = ACCESS_ONCE(s->seq);
+ if (unlikely(ret & 1)) {
+ cpu_relax();
+ goto repeat;
+ }
+ smp_rmb();
+ return ret;
+}
+
+static inline int gtod_read_retry(const struct vsyscall_gtod_data *s,
+ unsigned start)
+{
+ smp_rmb();
+ return unlikely(s->seq != start);
+}
+
+static inline void gtod_write_begin(struct vsyscall_gtod_data *s)
+{
+ ++s->seq;
+ smp_wmb();
+}
+
+static inline void gtod_write_end(struct vsyscall_gtod_data *s)
+{
+ smp_wmb();
+ ++s->seq;
+}
+
#endif /* _ASM_X86_VGTOD_H */
diff --git a/arch/x86/include/asm/vvar.h b/arch/x86/include/asm/vvar.h
index 52c79ff..081d909 100644
--- a/arch/x86/include/asm/vvar.h
+++ b/arch/x86/include/asm/vvar.h
@@ -16,6 +16,9 @@
* you mess up, the linker will catch it.)
*/

+#ifndef _ASM_X86_VVAR_H
+#define _ASM_X86_VVAR_H
+
#if defined(__VVAR_KERNEL_LDS)

/* The kernel linker script defines its own magic to put vvars in the
@@ -64,3 +67,5 @@ DECLARE_VVAR(16, int, vgetcpu_mode)
DECLARE_VVAR(128, struct vsyscall_gtod_data, vsyscall_gtod_data)

#undef DECLARE_VVAR
+
+#endif
diff --git a/arch/x86/kernel/vsyscall_gtod.c b/arch/x86/kernel/vsyscall_gtod.c
index 91862a4..973dcc4 100644
--- a/arch/x86/kernel/vsyscall_gtod.c
+++ b/arch/x86/kernel/vsyscall_gtod.c
@@ -4,6 +4,7 @@
*
* Modified for x86 32 bit architecture by
* Stefani Seibold <[email protected]>
+ * sponsored by Rohde & Schwarz GmbH & Co. KG Munich/Germany
*
* Thanks to [email protected] for some useful hint.
* Special thanks to Ingo Molnar for his early experience with
@@ -18,21 +19,22 @@ DEFINE_VVAR(struct vsyscall_gtod_data, vsyscall_gtod_data);

void update_vsyscall_tz(void)
{
- vsyscall_gtod_data.sys_tz = sys_tz;
+ vsyscall_gtod_data.tz_minuteswest = sys_tz.tz_minuteswest;
+ vsyscall_gtod_data.tz_dsttime = sys_tz.tz_dsttime;
}

void update_vsyscall(struct timekeeper *tk)
{
struct vsyscall_gtod_data *vdata = &vsyscall_gtod_data;

- write_seqcount_begin(&vdata->seq);
+ gtod_write_begin(vdata);

/* copy vsyscall data */
- vdata->clock.vclock_mode = tk->clock->archdata.vclock_mode;
- vdata->clock.cycle_last = tk->clock->cycle_last;
- vdata->clock.mask = tk->clock->mask;
- vdata->clock.mult = tk->mult;
- vdata->clock.shift = tk->shift;
+ vdata->vclock_mode = tk->clock->archdata.vclock_mode;
+ vdata->cycle_last = tk->clock->cycle_last;
+ vdata->mask = tk->clock->mask;
+ vdata->mult = tk->mult;
+ vdata->shift = tk->shift;

vdata->wall_time_sec = tk->xtime_sec;
vdata->wall_time_snsec = tk->xtime_nsec;
@@ -49,12 +51,19 @@ void update_vsyscall(struct timekeeper *tk)
vdata->monotonic_time_sec++;
}

- vdata->wall_time_coarse.tv_sec = tk->xtime_sec;
- vdata->wall_time_coarse.tv_nsec = (long)(tk->xtime_nsec >> tk->shift);
+ vdata->wall_time_coarse_sec = tk->xtime_sec;
+ vdata->wall_time_coarse_nsec = (long)(tk->xtime_nsec >> tk->shift);

- vdata->monotonic_time_coarse = timespec_add(vdata->wall_time_coarse,
- tk->wall_to_monotonic);
+ vdata->monotonic_time_coarse_sec =
+ vdata->wall_time_coarse_sec + tk->wall_to_monotonic.tv_sec;
+ vdata->monotonic_time_coarse_nsec =
+ vdata->wall_time_coarse_nsec + tk->wall_to_monotonic.tv_nsec;

- write_seqcount_end(&vdata->seq);
+ while (vdata->monotonic_time_coarse_nsec >= NSEC_PER_SEC) {
+ vdata->monotonic_time_coarse_nsec -= NSEC_PER_SEC;
+ vdata->monotonic_time_coarse_sec++;
+ }
+
+ gtod_write_end(vdata);
}

diff --git a/arch/x86/vdso/vclock_gettime.c b/arch/x86/vdso/vclock_gettime.c
index fcbc974..b2c5d39 100644
--- a/arch/x86/vdso/vclock_gettime.c
+++ b/arch/x86/vdso/vclock_gettime.c
@@ -116,7 +116,7 @@ static notrace cycle_t vread_pvclock(int *mode)
*mode = VCLOCK_NONE;

/* refer to tsc.c read_tsc() comment for rationale */
- last = gtod->clock.cycle_last;
+ last = gtod->cycle_last;

if (likely(ret >= last))
return ret;
@@ -147,7 +147,7 @@ notrace static long vdso_fallback_gettime(long clock, struct timespec *ts)
"call VDSO32_vsyscall \n"
"mov %%edx, %%ebx \n"
: "=a" (ret)
- : "0" (__NR_clock_gettime), "g" (clock), "c" (ts)
+ : "0" (__NR_ia32_clock_gettime), "g" (clock), "c" (ts)
: "memory", "edx");
return ret;
}
@@ -162,7 +162,7 @@ notrace static long vdso_fallback_gtod(struct timeval *tv, struct timezone *tz)
"call VDSO32_vsyscall \n"
"mov %%edx, %%ebx \n"
: "=a" (ret)
- : "0" (__NR_gettimeofday), "g" (tv), "c" (tz)
+ : "0" (__NR_ia32_gettimeofday), "g" (tv), "c" (tz)
: "memory", "edx");
return ret;
}
@@ -193,7 +193,7 @@ notrace static cycle_t vread_tsc(void)
rdtsc_barrier();
ret = (cycle_t)__native_read_tsc();

- last = gtod->clock.cycle_last;
+ last = gtod->cycle_last;

if (likely(ret >= last))
return ret;
@@ -214,20 +214,20 @@ notrace static inline u64 vgetsns(int *mode)
{
u64 v;
cycles_t cycles;
- if (gtod->clock.vclock_mode == VCLOCK_TSC)
+ if (gtod->vclock_mode == VCLOCK_TSC)
cycles = vread_tsc();
#ifdef CONFIG_HPET_TIMER
- else if (gtod->clock.vclock_mode == VCLOCK_HPET)
+ else if (gtod->vclock_mode == VCLOCK_HPET)
cycles = vread_hpet();
#endif
#ifdef CONFIG_PARAVIRT_CLOCK
- else if (gtod->clock.vclock_mode == VCLOCK_PVCLOCK)
+ else if (gtod->vclock_mode == VCLOCK_PVCLOCK)
cycles = vread_pvclock(mode);
#endif
else
return 0;
- v = (cycles - gtod->clock.cycle_last) & gtod->clock.mask;
- return v * gtod->clock.mult;
+ v = (cycles - gtod->cycle_last) & gtod->mask;
+ return v * gtod->mult;
}

/* Code size doesn't matter (vdso is 4k anyway) and this is faster. */
@@ -237,17 +237,18 @@ notrace static int __always_inline do_realtime(struct timespec *ts)
u64 ns;
int mode;

- ts->tv_nsec = 0;
do {
- seq = raw_read_seqcount_begin(&gtod->seq);
- mode = gtod->clock.vclock_mode;
+ seq = gtod_read_begin(gtod);
+ mode = gtod->vclock_mode;
ts->tv_sec = gtod->wall_time_sec;
ns = gtod->wall_time_snsec;
ns += vgetsns(&mode);
- ns >>= gtod->clock.shift;
- } while (unlikely(read_seqcount_retry(&gtod->seq, seq)));
+ ns >>= gtod->shift;
+ } while (unlikely(gtod_read_retry(gtod, seq)));
+
+ ts->tv_sec += __iter_div_u64_rem(ns, NSEC_PER_SEC, &ns);
+ ts->tv_nsec = ns;

- timespec_add_ns(ts, ns);
return mode;
}

@@ -257,16 +258,17 @@ notrace static int do_monotonic(struct timespec *ts)
u64 ns;
int mode;

- ts->tv_nsec = 0;
do {
- seq = raw_read_seqcount_begin(&gtod->seq);
- mode = gtod->clock.vclock_mode;
+ seq = gtod_read_begin(gtod);
+ mode = gtod->vclock_mode;
ts->tv_sec = gtod->monotonic_time_sec;
ns = gtod->monotonic_time_snsec;
ns += vgetsns(&mode);
- ns >>= gtod->clock.shift;
- } while (unlikely(read_seqcount_retry(&gtod->seq, seq)));
- timespec_add_ns(ts, ns);
+ ns >>= gtod->shift;
+ } while (unlikely(gtod_read_retry(gtod, seq)));
+
+ ts->tv_sec += __iter_div_u64_rem(ns, NSEC_PER_SEC, &ns);
+ ts->tv_nsec = ns;

return mode;
}
@@ -275,20 +277,20 @@ notrace static void do_realtime_coarse(struct timespec *ts)
{
unsigned long seq;
do {
- seq = raw_read_seqcount_begin(&gtod->seq);
- ts->tv_sec = gtod->wall_time_coarse.tv_sec;
- ts->tv_nsec = gtod->wall_time_coarse.tv_nsec;
- } while (unlikely(read_seqcount_retry(&gtod->seq, seq)));
+ seq = gtod_read_begin(gtod);
+ ts->tv_sec = gtod->wall_time_coarse_sec;
+ ts->tv_nsec = gtod->wall_time_coarse_nsec;
+ } while (unlikely(gtod_read_retry(gtod, seq)));
}

notrace static void do_monotonic_coarse(struct timespec *ts)
{
unsigned long seq;
do {
- seq = raw_read_seqcount_begin(&gtod->seq);
- ts->tv_sec = gtod->monotonic_time_coarse.tv_sec;
- ts->tv_nsec = gtod->monotonic_time_coarse.tv_nsec;
- } while (unlikely(read_seqcount_retry(&gtod->seq, seq)));
+ seq = gtod_read_begin(gtod);
+ ts->tv_sec = gtod->monotonic_time_coarse_sec;
+ ts->tv_nsec = gtod->monotonic_time_coarse_nsec;
+ } while (unlikely(gtod_read_retry(gtod, seq)));
}

notrace int __vdso_clock_gettime(clockid_t clock, struct timespec *ts)
@@ -322,17 +324,13 @@ int clock_gettime(clockid_t, struct timespec *)
notrace int __vdso_gettimeofday(struct timeval *tv, struct timezone *tz)
{
if (likely(tv != NULL)) {
- BUILD_BUG_ON(offsetof(struct timeval, tv_usec) !=
- offsetof(struct timespec, tv_nsec) ||
- sizeof(*tv) != sizeof(struct timespec));
if (unlikely(do_realtime((struct timespec *)tv) == VCLOCK_NONE))
return vdso_fallback_gtod(tv, tz);
tv->tv_usec /= 1000;
}
if (unlikely(tz != NULL)) {
- /* Avoid memcpy. Some old compilers fail to inline it */
- tz->tz_minuteswest = gtod->sys_tz.tz_minuteswest;
- tz->tz_dsttime = gtod->sys_tz.tz_dsttime;
+ tz->tz_minuteswest = gtod->tz_minuteswest;
+ tz->tz_dsttime = gtod->tz_dsttime;
}

return 0;
diff --git a/arch/x86/vdso/vdso32/vclock_gettime.c b/arch/x86/vdso/vdso32/vclock_gettime.c
index ffdbdb1..a578e44 100644
--- a/arch/x86/vdso/vdso32/vclock_gettime.c
+++ b/arch/x86/vdso/vdso32/vclock_gettime.c
@@ -2,6 +2,8 @@

#ifdef CONFIG_X86_64

+#include <asm/unistd_32_ia32.h>
+
/*
* Due the -m32 compilation, there will be a lot of
* "warning: integer constant is too large for 'unsigned long' type",
@@ -24,12 +26,43 @@
#define __va(x) 0

/*
+ * function load_cr3() in x86/include/asm/processor.h depends on __pa().
+ * Replace by a dummy is save since not used
+ */
+#define __pa(x) 0
+
+/*
+ * Prevents the include of arch/x86/include/asm/spinlock.h, which will generate
+ * a lot of warnings with make C=1.
+ * It is imposible not to include spinlock.h since most kernel headers does
+ * include it.
+ */
+#define _ASM_X86_SPINLOCK_H
+
+/*
+ * dummys for unneeded arck_spin functions
+ */
+static inline void arch_spin_unlock_wait(void *p)
+{
+}
+
+static inline int arch_spin_is_locked(void *p)
+{
+ return 0;
+}
+
+/*
* The define of CONFIG_ILLEGAL_POINTER_VALUE is also to prevent the
* "warning: integer constant is too large..."
*/
#undef CONFIG_ILLEGAL_POINTER_VALUE
#define CONFIG_ILLEGAL_POINTER_VALUE 0

+#else
+
+#define __NR_ia32_clock_gettime __NR_clock_gettime
+#define __NR_ia32_gettimeofday __NR_gettimeofday
+
#endif

#include "../vclock_gettime.c"
--
1.8.5.5

2014-02-16 21:54:23

by Stefani Seibold

[permalink] [raw]
Subject: [PATCH v18 04/10] vclock_gettime.c __vdso_clock_gettime cleanup

This patch is a small code cleanup for the __vdso_clock_gettime() function.

It removes the unneeded return values from do_monotonic_coarse() and
do_realtime_coarse() and add a fallback label for doing the kernel
gettimeofday() system call.

Signed-off-by: Stefani Seibold <[email protected]>
---
arch/x86/vdso/vclock_gettime.c | 27 ++++++++++++++-------------
1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/arch/x86/vdso/vclock_gettime.c b/arch/x86/vdso/vclock_gettime.c
index bbc8065..fd074dd 100644
--- a/arch/x86/vdso/vclock_gettime.c
+++ b/arch/x86/vdso/vclock_gettime.c
@@ -209,7 +209,7 @@ notrace static int do_monotonic(struct timespec *ts)
return mode;
}

-notrace static int do_realtime_coarse(struct timespec *ts)
+notrace static void do_realtime_coarse(struct timespec *ts)
{
unsigned long seq;
do {
@@ -217,10 +217,9 @@ notrace static int do_realtime_coarse(struct timespec *ts)
ts->tv_sec = gtod->wall_time_coarse.tv_sec;
ts->tv_nsec = gtod->wall_time_coarse.tv_nsec;
} while (unlikely(read_seqcount_retry(&gtod->seq, seq)));
- return 0;
}

-notrace static int do_monotonic_coarse(struct timespec *ts)
+notrace static void do_monotonic_coarse(struct timespec *ts)
{
unsigned long seq;
do {
@@ -228,30 +227,32 @@ notrace static int do_monotonic_coarse(struct timespec *ts)
ts->tv_sec = gtod->monotonic_time_coarse.tv_sec;
ts->tv_nsec = gtod->monotonic_time_coarse.tv_nsec;
} while (unlikely(read_seqcount_retry(&gtod->seq, seq)));
-
- return 0;
}

notrace int __vdso_clock_gettime(clockid_t clock, struct timespec *ts)
{
- int ret = VCLOCK_NONE;
-
switch (clock) {
case CLOCK_REALTIME:
- ret = do_realtime(ts);
+ if (do_realtime(ts) == VCLOCK_NONE)
+ goto fallback;
break;
case CLOCK_MONOTONIC:
- ret = do_monotonic(ts);
+ if (do_monotonic(ts) == VCLOCK_NONE)
+ goto fallback;
break;
case CLOCK_REALTIME_COARSE:
- return do_realtime_coarse(ts);
+ do_realtime_coarse(ts);
+ break;
case CLOCK_MONOTONIC_COARSE:
- return do_monotonic_coarse(ts);
+ do_monotonic_coarse(ts);
+ break;
+ default:
+ goto fallback;
}

- if (ret == VCLOCK_NONE)
- return vdso_fallback_gettime(clock, ts);
return 0;
+fallback:
+ return vdso_fallback_gettime(clock, ts);
}
int clock_gettime(clockid_t, struct timespec *)
__attribute__((weak, alias("__vdso_clock_gettime")));
--
1.8.5.5

Subject: [tip:x86/vdso] x86, vdso: Make vsyscall_gtod_data handling x86 generic

Commit-ID: 0d3ad8c4e6246637b289c22dfe12e3dbae516aef
Gitweb: http://git.kernel.org/tip/0d3ad8c4e6246637b289c22dfe12e3dbae516aef
Author: Stefani Seibold <[email protected]>
AuthorDate: Sun, 16 Feb 2014 22:52:39 +0100
Committer: H. Peter Anvin <[email protected]>
CommitDate: Sun, 16 Feb 2014 15:04:06 -0800

x86, vdso: Make vsyscall_gtod_data handling x86 generic

This patch move the vsyscall_gtod_data handling out of vsyscall_64.c
into an additonal file vsyscall_gtod.c to make the functionality
available for the x86 32-bit kernel.

It also adds a new vsyscall_32.c which sets up the VVAR page.

Signed-off-by: Stefani Seibold <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: H. Peter Anvin <[email protected]>
---
arch/x86/Kconfig | 4 +--
arch/x86/include/asm/clocksource.h | 4 ---
arch/x86/include/asm/fixmap.h | 2 ++
arch/x86/include/asm/vvar.h | 12 ++++++--
arch/x86/kernel/Makefile | 3 +-
arch/x86/kernel/hpet.c | 4 ---
arch/x86/kernel/setup.c | 2 --
arch/x86/kernel/tsc.c | 2 --
arch/x86/kernel/vmlinux.lds.S | 3 --
arch/x86/kernel/vsyscall_32.c | 20 +++++++++++++
arch/x86/kernel/vsyscall_64.c | 45 -----------------------------
arch/x86/kernel/vsyscall_gtod.c | 59 ++++++++++++++++++++++++++++++++++++++
arch/x86/tools/relocs.c | 2 +-
13 files changed, 95 insertions(+), 67 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 0af5250..0da3b39 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -107,9 +107,9 @@ config X86
select HAVE_ARCH_SOFT_DIRTY
select CLOCKSOURCE_WATCHDOG
select GENERIC_CLOCKEVENTS
- select ARCH_CLOCKSOURCE_DATA if X86_64
+ select ARCH_CLOCKSOURCE_DATA
select GENERIC_CLOCKEVENTS_BROADCAST if X86_64 || (X86_32 && X86_LOCAL_APIC)
- select GENERIC_TIME_VSYSCALL if X86_64
+ select GENERIC_TIME_VSYSCALL
select KTIME_SCALAR if X86_32
select GENERIC_STRNCPY_FROM_USER
select GENERIC_STRNLEN_USER
diff --git a/arch/x86/include/asm/clocksource.h b/arch/x86/include/asm/clocksource.h
index 16a57f4..eda81dc 100644
--- a/arch/x86/include/asm/clocksource.h
+++ b/arch/x86/include/asm/clocksource.h
@@ -3,8 +3,6 @@
#ifndef _ASM_X86_CLOCKSOURCE_H
#define _ASM_X86_CLOCKSOURCE_H

-#ifdef CONFIG_X86_64
-
#define VCLOCK_NONE 0 /* No vDSO clock available. */
#define VCLOCK_TSC 1 /* vDSO should use vread_tsc. */
#define VCLOCK_HPET 2 /* vDSO should use vread_hpet. */
@@ -14,6 +12,4 @@ struct arch_clocksource_data {
int vclock_mode;
};

-#endif /* CONFIG_X86_64 */
-
#endif /* _ASM_X86_CLOCKSOURCE_H */
diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index 7252cd3..094d0cc 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -75,6 +75,8 @@ enum fixed_addresses {
#ifdef CONFIG_X86_32
FIX_HOLE,
FIX_VDSO,
+ VVAR_PAGE,
+ VSYSCALL_HPET,
#else
VSYSCALL_LAST_PAGE,
VSYSCALL_FIRST_PAGE = VSYSCALL_LAST_PAGE
diff --git a/arch/x86/include/asm/vvar.h b/arch/x86/include/asm/vvar.h
index d76ac40..0a534ea 100644
--- a/arch/x86/include/asm/vvar.h
+++ b/arch/x86/include/asm/vvar.h
@@ -16,9 +16,6 @@
* you mess up, the linker will catch it.)
*/

-/* Base address of vvars. This is not ABI. */
-#define VVAR_ADDRESS (-10*1024*1024 - 4096)
-
#if defined(__VVAR_KERNEL_LDS)

/* The kernel linker script defines its own magic to put vvars in the
@@ -29,6 +26,15 @@

#else

+extern char __vvar_page;
+
+/* Base address of vvars. This is not ABI. */
+#ifdef CONFIG_X86_64
+#define VVAR_ADDRESS (-10*1024*1024 - 4096)
+#else
+#define VVAR_ADDRESS (&__vvar_page)
+#endif
+
#define DECLARE_VVAR(offset, type, name) \
static type const * const vvaraddr_ ## name = \
(void *)(VVAR_ADDRESS + (offset));
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index cb648c8..3282eda 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -26,7 +26,8 @@ obj-$(CONFIG_IRQ_WORK) += irq_work.o
obj-y += probe_roms.o
obj-$(CONFIG_X86_32) += i386_ksyms_32.o
obj-$(CONFIG_X86_64) += sys_x86_64.o x8664_ksyms_64.o
-obj-y += syscall_$(BITS).o
+obj-y += syscall_$(BITS).o vsyscall_gtod.o
+obj-$(CONFIG_X86_32) += vsyscall_32.o
obj-$(CONFIG_X86_64) += vsyscall_64.o
obj-$(CONFIG_X86_64) += vsyscall_emu_64.o
obj-$(CONFIG_SYSFS) += ksysfs.o
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index da85a8e..54263f0 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -74,9 +74,7 @@ static inline void hpet_writel(unsigned int d, unsigned int a)
static inline void hpet_set_mapping(void)
{
hpet_virt_address = ioremap_nocache(hpet_address, HPET_MMAP_SIZE);
-#ifdef CONFIG_X86_64
__set_fixmap(VSYSCALL_HPET, hpet_address, PAGE_KERNEL_VVAR_NOCACHE);
-#endif
}

static inline void hpet_clear_mapping(void)
@@ -752,9 +750,7 @@ static struct clocksource clocksource_hpet = {
.mask = HPET_MASK,
.flags = CLOCK_SOURCE_IS_CONTINUOUS,
.resume = hpet_resume_counter,
-#ifdef CONFIG_X86_64
.archdata = { .vclock_mode = VCLOCK_HPET },
-#endif
};

static int hpet_clocksource_register(void)
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 06853e6..56ff330 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1182,9 +1182,7 @@ void __init setup_arch(char **cmdline_p)

tboot_probe();

-#ifdef CONFIG_X86_64
map_vsyscall();
-#endif

generic_apic_probe();

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index acb3b60..a99a490 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -988,9 +988,7 @@ static struct clocksource clocksource_tsc = {
.mask = CLOCKSOURCE_MASK(64),
.flags = CLOCK_SOURCE_IS_CONTINUOUS |
CLOCK_SOURCE_MUST_VERIFY,
-#ifdef CONFIG_X86_64
.archdata = { .vclock_mode = VCLOCK_TSC },
-#endif
};

void mark_tsc_unstable(char *reason)
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index da6b35a..1d4897b 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -147,7 +147,6 @@ SECTIONS
_edata = .;
} :data

-#ifdef CONFIG_X86_64

. = ALIGN(PAGE_SIZE);
__vvar_page = .;
@@ -169,8 +168,6 @@ SECTIONS

. = ALIGN(__vvar_page + PAGE_SIZE, PAGE_SIZE);

-#endif /* CONFIG_X86_64 */
-
/* Init code and data - will be freed after init */
. = ALIGN(PAGE_SIZE);
.init.begin : AT(ADDR(.init.begin) - LOAD_OFFSET) {
diff --git a/arch/x86/kernel/vsyscall_32.c b/arch/x86/kernel/vsyscall_32.c
new file mode 100644
index 0000000..4b94c47
--- /dev/null
+++ b/arch/x86/kernel/vsyscall_32.c
@@ -0,0 +1,20 @@
+/*
+ * Copyright (C) 2001 Andrea Arcangeli <[email protected]> SuSE
+ * Copyright 2003 Andi Kleen, SuSE Labs.
+ *
+ * Modified for x86 32 bit arch by Stefani Seibold <[email protected]>
+ *
+ * Thanks to [email protected] for some useful hint.
+ * Special thanks to Ingo Molnar for his early experience with
+ * a different vsyscall implementation for Linux/IA32 and for the name.
+ *
+ */
+
+#include <asm/vsyscall.h>
+#include <asm/pgtable.h>
+#include <asm/fixmap.h>
+
+void __init map_vsyscall(void)
+{
+ __set_fixmap(VVAR_PAGE, __pa_symbol(&__vvar_page), PAGE_KERNEL_VVAR);
+}
diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index 1f96f93..9ea2876 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -47,14 +47,12 @@
#include <asm/segment.h>
#include <asm/desc.h>
#include <asm/topology.h>
-#include <asm/vgtod.h>
#include <asm/traps.h>

#define CREATE_TRACE_POINTS
#include "vsyscall_trace.h"

DEFINE_VVAR(int, vgetcpu_mode);
-DEFINE_VVAR(struct vsyscall_gtod_data, vsyscall_gtod_data);

static enum { EMULATE, NATIVE, NONE } vsyscall_mode = EMULATE;

@@ -77,48 +75,6 @@ static int __init vsyscall_setup(char *str)
}
early_param("vsyscall", vsyscall_setup);

-void update_vsyscall_tz(void)
-{
- vsyscall_gtod_data.sys_tz = sys_tz;
-}
-
-void update_vsyscall(struct timekeeper *tk)
-{
- struct vsyscall_gtod_data *vdata = &vsyscall_gtod_data;
-
- write_seqcount_begin(&vdata->seq);
-
- /* copy vsyscall data */
- vdata->clock.vclock_mode = tk->clock->archdata.vclock_mode;
- vdata->clock.cycle_last = tk->clock->cycle_last;
- vdata->clock.mask = tk->clock->mask;
- vdata->clock.mult = tk->mult;
- vdata->clock.shift = tk->shift;
-
- vdata->wall_time_sec = tk->xtime_sec;
- vdata->wall_time_snsec = tk->xtime_nsec;
-
- vdata->monotonic_time_sec = tk->xtime_sec
- + tk->wall_to_monotonic.tv_sec;
- vdata->monotonic_time_snsec = tk->xtime_nsec
- + (tk->wall_to_monotonic.tv_nsec
- << tk->shift);
- while (vdata->monotonic_time_snsec >=
- (((u64)NSEC_PER_SEC) << tk->shift)) {
- vdata->monotonic_time_snsec -=
- ((u64)NSEC_PER_SEC) << tk->shift;
- vdata->monotonic_time_sec++;
- }
-
- vdata->wall_time_coarse.tv_sec = tk->xtime_sec;
- vdata->wall_time_coarse.tv_nsec = (long)(tk->xtime_nsec >> tk->shift);
-
- vdata->monotonic_time_coarse = timespec_add(vdata->wall_time_coarse,
- tk->wall_to_monotonic);
-
- write_seqcount_end(&vdata->seq);
-}
-
static void warn_bad_vsyscall(const char *level, struct pt_regs *regs,
const char *message)
{
@@ -374,7 +330,6 @@ void __init map_vsyscall(void)
{
extern char __vsyscall_page;
unsigned long physaddr_vsyscall = __pa_symbol(&__vsyscall_page);
- extern char __vvar_page;
unsigned long physaddr_vvar_page = __pa_symbol(&__vvar_page);

__set_fixmap(VSYSCALL_FIRST_PAGE, physaddr_vsyscall,
diff --git a/arch/x86/kernel/vsyscall_gtod.c b/arch/x86/kernel/vsyscall_gtod.c
new file mode 100644
index 0000000..b5a943d
--- /dev/null
+++ b/arch/x86/kernel/vsyscall_gtod.c
@@ -0,0 +1,59 @@
+/*
+ * Copyright (C) 2001 Andrea Arcangeli <[email protected]> SuSE
+ * Copyright 2003 Andi Kleen, SuSE Labs.
+ *
+ * Modified for x86 32 bit architecture by
+ * Stefani Seibold <[email protected]>
+ *
+ * Thanks to [email protected] for some useful hint.
+ * Special thanks to Ingo Molnar for his early experience with
+ * a different vsyscall implementation for Linux/IA32 and for the name.
+ *
+ */
+
+#include <linux/timekeeper_internal.h>
+#include <asm/vgtod.h>
+
+DEFINE_VVAR(struct vsyscall_gtod_data, vsyscall_gtod_data);
+
+void update_vsyscall_tz(void)
+{
+ vsyscall_gtod_data.sys_tz = sys_tz;
+}
+
+void update_vsyscall(struct timekeeper *tk)
+{
+ struct vsyscall_gtod_data *vdata = &vsyscall_gtod_data;
+
+ write_seqcount_begin(&vdata->seq);
+
+ /* copy vsyscall data */
+ vdata->clock.vclock_mode = tk->clock->archdata.vclock_mode;
+ vdata->clock.cycle_last = tk->clock->cycle_last;
+ vdata->clock.mask = tk->clock->mask;
+ vdata->clock.mult = tk->mult;
+ vdata->clock.shift = tk->shift;
+
+ vdata->wall_time_sec = tk->xtime_sec;
+ vdata->wall_time_snsec = tk->xtime_nsec;
+
+ vdata->monotonic_time_sec = tk->xtime_sec
+ + tk->wall_to_monotonic.tv_sec;
+ vdata->monotonic_time_snsec = tk->xtime_nsec
+ + (tk->wall_to_monotonic.tv_nsec
+ << tk->shift);
+ while (vdata->monotonic_time_snsec >=
+ (((u64)NSEC_PER_SEC) << tk->shift)) {
+ vdata->monotonic_time_snsec -=
+ ((u64)NSEC_PER_SEC) << tk->shift;
+ vdata->monotonic_time_sec++;
+ }
+
+ vdata->wall_time_coarse.tv_sec = tk->xtime_sec;
+ vdata->wall_time_coarse.tv_nsec = (long)(tk->xtime_nsec >> tk->shift);
+
+ vdata->monotonic_time_coarse = timespec_add(vdata->wall_time_coarse,
+ tk->wall_to_monotonic);
+
+ write_seqcount_end(&vdata->seq);
+}
diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index cfbdbdb..bbb1d22 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -69,8 +69,8 @@ static const char * const sym_regex_kernel[S_NSYMTYPES] = {
"__per_cpu_load|"
"init_per_cpu__.*|"
"__end_rodata_hpage_align|"
- "__vvar_page|"
#endif
+ "__vvar_page|"
"_end)$"
};

Subject: [tip:x86/vdso] mm: Add new func _install_special_mapping() to mmap.c

Commit-ID: 48be0cb586e850e3ff5c37fe9339f233f9c893e4
Gitweb: http://git.kernel.org/tip/48be0cb586e850e3ff5c37fe9339f233f9c893e4
Author: Stefani Seibold <[email protected]>
AuthorDate: Sun, 16 Feb 2014 22:52:40 +0100
Committer: H. Peter Anvin <[email protected]>
CommitDate: Sun, 16 Feb 2014 15:04:23 -0800

mm: Add new func _install_special_mapping() to mmap.c

The _install_special_mapping() is the new base function for
install_special_mapping(). This function will return a pointer of the
created VMA or a error code in an ERR_PTR().

This new function will be needed by the for the x86 vdso 32-bit
support to map the additonal vvar and hpet pages into the 32 bit
address space. This will be done with io_remap_pfn_range() and
remap_pfn_range, which requieres a vm_area_struct.

Signed-off-by: Stefani Seibold <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: H. Peter Anvin <[email protected]>
---
include/linux/mm.h | 3 +++
mm/mmap.c | 20 ++++++++++++++++----
2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index f28f46e..55342aa 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1740,6 +1740,9 @@ extern void set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file);
extern struct file *get_mm_exe_file(struct mm_struct *mm);

extern int may_expand_vm(struct mm_struct *mm, unsigned long npages);
+extern struct vm_area_struct *_install_special_mapping(struct mm_struct *mm,
+ unsigned long addr, unsigned long len,
+ unsigned long flags, struct page **pages);
extern int install_special_mapping(struct mm_struct *mm,
unsigned long addr, unsigned long len,
unsigned long flags, struct page **pages);
diff --git a/mm/mmap.c b/mm/mmap.c
index 20ff0c3..81ba54f 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2918,7 +2918,7 @@ static const struct vm_operations_struct special_mapping_vmops = {
* The array pointer and the pages it points to are assumed to stay alive
* for as long as this mapping might exist.
*/
-int install_special_mapping(struct mm_struct *mm,
+struct vm_area_struct *_install_special_mapping(struct mm_struct *mm,
unsigned long addr, unsigned long len,
unsigned long vm_flags, struct page **pages)
{
@@ -2927,7 +2927,7 @@ int install_special_mapping(struct mm_struct *mm,

vma = kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL);
if (unlikely(vma == NULL))
- return -ENOMEM;
+ return ERR_PTR(-ENOMEM);

INIT_LIST_HEAD(&vma->anon_vma_chain);
vma->vm_mm = mm;
@@ -2948,11 +2948,23 @@ int install_special_mapping(struct mm_struct *mm,

perf_event_mmap(vma);

- return 0;
+ return vma;

out:
kmem_cache_free(vm_area_cachep, vma);
- return ret;
+ return ERR_PTR(ret);
+}
+
+int install_special_mapping(struct mm_struct *mm,
+ unsigned long addr, unsigned long len,
+ unsigned long vm_flags, struct page **pages)
+{
+ struct vm_area_struct *vma = _install_special_mapping(mm,
+ addr, len, vm_flags, pages);
+
+ if (IS_ERR(vma))
+ return PTR_ERR(vma);
+ return 0;
}

static DEFINE_MUTEX(mm_all_locks_mutex);

Subject: [tip:x86/vdso] x86, vdso: vclock_gettime.c __vdso_clock_gettime cleanup

Commit-ID: 3b19f50facf0488e193ebae00b864fdaeeb25dbb
Gitweb: http://git.kernel.org/tip/3b19f50facf0488e193ebae00b864fdaeeb25dbb
Author: Stefani Seibold <[email protected]>
AuthorDate: Sun, 16 Feb 2014 22:52:42 +0100
Committer: H. Peter Anvin <[email protected]>
CommitDate: Sun, 16 Feb 2014 15:06:47 -0800

x86, vdso: vclock_gettime.c __vdso_clock_gettime cleanup

This patch is a small code cleanup for the __vdso_clock_gettime()
function.

It removes the unneeded return values from do_monotonic_coarse() and
do_realtime_coarse() and add a fallback label for doing the kernel
gettimeofday() system call.

Signed-off-by: Stefani Seibold <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: H. Peter Anvin <[email protected]>
---
arch/x86/vdso/vclock_gettime.c | 27 ++++++++++++++-------------
1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/arch/x86/vdso/vclock_gettime.c b/arch/x86/vdso/vclock_gettime.c
index bbc8065..fd074dd 100644
--- a/arch/x86/vdso/vclock_gettime.c
+++ b/arch/x86/vdso/vclock_gettime.c
@@ -209,7 +209,7 @@ notrace static int do_monotonic(struct timespec *ts)
return mode;
}

-notrace static int do_realtime_coarse(struct timespec *ts)
+notrace static void do_realtime_coarse(struct timespec *ts)
{
unsigned long seq;
do {
@@ -217,10 +217,9 @@ notrace static int do_realtime_coarse(struct timespec *ts)
ts->tv_sec = gtod->wall_time_coarse.tv_sec;
ts->tv_nsec = gtod->wall_time_coarse.tv_nsec;
} while (unlikely(read_seqcount_retry(&gtod->seq, seq)));
- return 0;
}

-notrace static int do_monotonic_coarse(struct timespec *ts)
+notrace static void do_monotonic_coarse(struct timespec *ts)
{
unsigned long seq;
do {
@@ -228,30 +227,32 @@ notrace static int do_monotonic_coarse(struct timespec *ts)
ts->tv_sec = gtod->monotonic_time_coarse.tv_sec;
ts->tv_nsec = gtod->monotonic_time_coarse.tv_nsec;
} while (unlikely(read_seqcount_retry(&gtod->seq, seq)));
-
- return 0;
}

notrace int __vdso_clock_gettime(clockid_t clock, struct timespec *ts)
{
- int ret = VCLOCK_NONE;
-
switch (clock) {
case CLOCK_REALTIME:
- ret = do_realtime(ts);
+ if (do_realtime(ts) == VCLOCK_NONE)
+ goto fallback;
break;
case CLOCK_MONOTONIC:
- ret = do_monotonic(ts);
+ if (do_monotonic(ts) == VCLOCK_NONE)
+ goto fallback;
break;
case CLOCK_REALTIME_COARSE:
- return do_realtime_coarse(ts);
+ do_realtime_coarse(ts);
+ break;
case CLOCK_MONOTONIC_COARSE:
- return do_monotonic_coarse(ts);
+ do_monotonic_coarse(ts);
+ break;
+ default:
+ goto fallback;
}

- if (ret == VCLOCK_NONE)
- return vdso_fallback_gettime(clock, ts);
return 0;
+fallback:
+ return vdso_fallback_gettime(clock, ts);
}
int clock_gettime(clockid_t, struct timespec *)
__attribute__((weak, alias("__vdso_clock_gettime")));

Subject: [tip:x86/vdso] x86, vdso: Replace VVAR(vsyscall_gtod_data) by the gtod macro

Commit-ID: d3e68e3e3fed760169cef2fa95e73551f5d24022
Gitweb: http://git.kernel.org/tip/d3e68e3e3fed760169cef2fa95e73551f5d24022
Author: Stefani Seibold <[email protected]>
AuthorDate: Sun, 16 Feb 2014 22:52:43 +0100
Committer: H. Peter Anvin <[email protected]>
CommitDate: Sun, 16 Feb 2014 15:07:01 -0800

x86, vdso: Replace VVAR(vsyscall_gtod_data) by the gtod macro

There a currently more than 30 users of the gtod macro, so replace the
last VVAR(vsyscall_gtod_data) by gtod macro.

Signed-off-by: Stefani Seibold <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: H. Peter Anvin <[email protected]>
---
arch/x86/vdso/vclock_gettime.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/vdso/vclock_gettime.c b/arch/x86/vdso/vclock_gettime.c
index fd074dd..743f277 100644
--- a/arch/x86/vdso/vclock_gettime.c
+++ b/arch/x86/vdso/vclock_gettime.c
@@ -109,7 +109,7 @@ static notrace cycle_t vread_pvclock(int *mode)
*mode = VCLOCK_NONE;

/* refer to tsc.c read_tsc() comment for rationale */
- last = VVAR(vsyscall_gtod_data).clock.cycle_last;
+ last = gtod->clock.cycle_last;

if (likely(ret >= last))
return ret;
@@ -133,7 +133,7 @@ notrace static cycle_t vread_tsc(void)
rdtsc_barrier();
ret = (cycle_t)vget_cycles();

- last = VVAR(vsyscall_gtod_data).clock.cycle_last;
+ last = gtod->clock.cycle_last;

if (likely(ret >= last))
return ret;
@@ -288,7 +288,7 @@ int gettimeofday(struct timeval *, struct timezone *)
notrace time_t __vdso_time(time_t *t)
{
/* This is atomic on x86_64 so we don't need any locks. */
- time_t result = ACCESS_ONCE(VVAR(vsyscall_gtod_data).wall_time_sec);
+ time_t result = ACCESS_ONCE(gtod->wall_time_sec);

if (t)
*t = result;

Subject: [tip:x86/vdso] x86, vdso: Cleanup __vdso_gettimeofday()

Commit-ID: bada923abe5d8b015efe0e49ca47f76af853972d
Gitweb: http://git.kernel.org/tip/bada923abe5d8b015efe0e49ca47f76af853972d
Author: Stefani Seibold <[email protected]>
AuthorDate: Sun, 16 Feb 2014 22:52:44 +0100
Committer: H. Peter Anvin <[email protected]>
CommitDate: Sun, 16 Feb 2014 15:07:31 -0800

x86, vdso: Cleanup __vdso_gettimeofday()

This patch do a little cleanup for the __vdso_gettimeofday() function.

It kicks out an unneeded ret local variable and makes the code faster
if only the timezone is needed.

Signed-off-by: Stefani Seibold <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: H. Peter Anvin <[email protected]>
---
arch/x86/vdso/vclock_gettime.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/x86/vdso/vclock_gettime.c b/arch/x86/vdso/vclock_gettime.c
index 743f277..09dae4a 100644
--- a/arch/x86/vdso/vclock_gettime.c
+++ b/arch/x86/vdso/vclock_gettime.c
@@ -259,13 +259,12 @@ int clock_gettime(clockid_t, struct timespec *)

notrace int __vdso_gettimeofday(struct timeval *tv, struct timezone *tz)
{
- long ret = VCLOCK_NONE;
-
if (likely(tv != NULL)) {
BUILD_BUG_ON(offsetof(struct timeval, tv_usec) !=
offsetof(struct timespec, tv_nsec) ||
sizeof(*tv) != sizeof(struct timespec));
- ret = do_realtime((struct timespec *)tv);
+ if (unlikely(do_realtime((struct timespec *)tv) == VCLOCK_NONE))
+ return vdso_fallback_gtod(tv, tz);
tv->tv_usec /= 1000;
}
if (unlikely(tz != NULL)) {
@@ -274,8 +273,6 @@ notrace int __vdso_gettimeofday(struct timeval *tv, struct timezone *tz)
tz->tz_dsttime = gtod->sys_tz.tz_dsttime;
}

- if (ret == VCLOCK_NONE)
- return vdso_fallback_gtod(tv, tz);
return 0;
}
int gettimeofday(struct timeval *, struct timezone *)

Subject: [tip:x86/vdso] x86, vdso: Introduce VVAR marco for vdso32

Commit-ID: 995106bc0373be03295aa6e0e380dd33a3a37ea4
Gitweb: http://git.kernel.org/tip/995106bc0373be03295aa6e0e380dd33a3a37ea4
Author: Stefani Seibold <[email protected]>
AuthorDate: Sun, 16 Feb 2014 22:52:45 +0100
Committer: H. Peter Anvin <[email protected]>
CommitDate: Sun, 16 Feb 2014 15:07:45 -0800

x86, vdso: Introduce VVAR marco for vdso32

This patch revamps vvar.h for introduce the VVAR macro for vdso32.

Signed-off-by: Stefani Seibold <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: H. Peter Anvin <[email protected]>
---
arch/x86/include/asm/vvar.h | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/vvar.h b/arch/x86/include/asm/vvar.h
index 0a534ea..52c79ff 100644
--- a/arch/x86/include/asm/vvar.h
+++ b/arch/x86/include/asm/vvar.h
@@ -26,6 +26,15 @@

#else

+#ifdef BUILD_VDSO32
+
+#define DECLARE_VVAR(offset, type, name) \
+ extern type vvar_ ## name __attribute__((visibility("hidden")));
+
+#define VVAR(name) (vvar_ ## name)
+
+#else
+
extern char __vvar_page;

/* Base address of vvars. This is not ABI. */
@@ -39,12 +48,13 @@ extern char __vvar_page;
static type const * const vvaraddr_ ## name = \
(void *)(VVAR_ADDRESS + (offset));

+#define VVAR(name) (*vvaraddr_ ## name)
+#endif
+
#define DEFINE_VVAR(type, name) \
type name \
__attribute__((section(".vvar_" #name), aligned(16))) __visible

-#define VVAR(name) (*vvaraddr_ ## name)
-
#endif

/* DECLARE_VVAR(offset, type, name) */

Subject: [tip:x86/vdso] x86, vdso: Add 32-bit VDSO time support for the 32-bit kernel

Commit-ID: feea5bae36ba8fcd7095e1b23cc2c537f4d24562
Gitweb: http://git.kernel.org/tip/feea5bae36ba8fcd7095e1b23cc2c537f4d24562
Author: Stefani Seibold <[email protected]>
AuthorDate: Sun, 16 Feb 2014 22:52:46 +0100
Committer: H. Peter Anvin <[email protected]>
CommitDate: Sun, 16 Feb 2014 15:08:18 -0800

x86, vdso: Add 32-bit VDSO time support for the 32-bit kernel

This patch add the time support for the 32-bit VDSO to the 32 bit
kernel.

For 32-bit programs running on a 32-bit kernel, the same mechanism is
used as for 64-bit programs running on a 64-bit kernel.

Signed-off-by: Stefani Seibold <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: H. Peter Anvin <[email protected]>
---
arch/x86/include/asm/vdso.h | 3 ++
arch/x86/include/asm/vdso32.h | 11 ++++++
arch/x86/vdso/Makefile | 8 ++++
arch/x86/vdso/vclock_gettime.c | 74 ++++++++++++++++++++++++++++++++---
arch/x86/vdso/vdso-layout.lds.S | 22 +++++++++++
arch/x86/vdso/vdso32-setup.c | 53 ++++++++++++++++++++++---
arch/x86/vdso/vdso32/vclock_gettime.c | 35 +++++++++++++++++
arch/x86/vdso/vdso32/vdso32.lds.S | 9 +++++
8 files changed, 203 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/vdso.h b/arch/x86/include/asm/vdso.h
index fddb53d..fe3cef9 100644
--- a/arch/x86/include/asm/vdso.h
+++ b/arch/x86/include/asm/vdso.h
@@ -2,6 +2,9 @@
#define _ASM_X86_VDSO_H

#if defined CONFIG_X86_32 || defined CONFIG_COMPAT
+
+#include <asm/vdso32.h>
+
extern const char VDSO32_PRELINK[];

/*
diff --git a/arch/x86/include/asm/vdso32.h b/arch/x86/include/asm/vdso32.h
new file mode 100644
index 0000000..7efb701
--- /dev/null
+++ b/arch/x86/include/asm/vdso32.h
@@ -0,0 +1,11 @@
+#ifndef _ASM_X86_VDSO32_H
+#define _ASM_X86_VDSO32_H
+
+#define VDSO_BASE_PAGE 0
+#define VDSO_VVAR_PAGE 1
+#define VDSO_HPET_PAGE 2
+#define VDSO_PAGES 3
+#define VDSO_PREV_PAGES 2
+#define VDSO_OFFSET(x) ((x) * PAGE_SIZE)
+
+#endif
diff --git a/arch/x86/vdso/Makefile b/arch/x86/vdso/Makefile
index fd14be1..92daaa6 100644
--- a/arch/x86/vdso/Makefile
+++ b/arch/x86/vdso/Makefile
@@ -145,8 +145,16 @@ KBUILD_AFLAGS_32 := $(filter-out -m64,$(KBUILD_AFLAGS))
$(vdso32-images:%=$(obj)/%.dbg): KBUILD_AFLAGS = $(KBUILD_AFLAGS_32)
$(vdso32-images:%=$(obj)/%.dbg): asflags-$(CONFIG_X86_64) += -m32

+KBUILD_CFLAGS_32 := $(filter-out -m64,$(KBUILD_CFLAGS))
+KBUILD_CFLAGS_32 := $(filter-out -mcmodel=kernel,$(KBUILD_CFLAGS_32))
+KBUILD_CFLAGS_32 := $(filter-out -fno-pic,$(KBUILD_CFLAGS_32))
+KBUILD_CFLAGS_32 := $(filter-out -mfentry,$(KBUILD_CFLAGS_32))
+KBUILD_CFLAGS_32 += -m32 -msoft-float -mregparm=3 -freg-struct-return -fpic
+$(vdso32-images:%=$(obj)/%.dbg): KBUILD_CFLAGS = $(KBUILD_CFLAGS_32)
+
$(vdso32-images:%=$(obj)/%.dbg): $(obj)/vdso32-%.so.dbg: FORCE \
$(obj)/vdso32/vdso32.lds \
+ $(obj)/vdso32/vclock_gettime.o \
$(obj)/vdso32/note.o \
$(obj)/vdso32/%.o
$(call if_changed,vdso)
diff --git a/arch/x86/vdso/vclock_gettime.c b/arch/x86/vdso/vclock_gettime.c
index 09dae4a..fcbc974 100644
--- a/arch/x86/vdso/vclock_gettime.c
+++ b/arch/x86/vdso/vclock_gettime.c
@@ -4,6 +4,9 @@
*
* Fast user context implementation of clock_gettime, gettimeofday, and time.
*
+ * 32 Bit compat layer by Stefani Seibold <[email protected]>
+ * sponsored by Rohde & Schwarz GmbH & Co. KG Munich/Germany
+ *
* The code should have no internal unresolved relocations.
* Check with readelf after changing.
*/
@@ -12,13 +15,11 @@
#define DISABLE_BRANCH_PROFILING

#include <linux/kernel.h>
-#include <linux/posix-timers.h>
-#include <linux/time.h>
+#include <uapi/linux/time.h>
#include <linux/string.h>
#include <asm/vsyscall.h>
#include <asm/fixmap.h>
#include <asm/vgtod.h>
-#include <asm/timex.h>
#include <asm/hpet.h>
#include <asm/unistd.h>
#include <asm/io.h>
@@ -26,6 +27,12 @@

#define gtod (&VVAR(vsyscall_gtod_data))

+extern int __vdso_clock_gettime(clockid_t clock, struct timespec *ts);
+extern int __vdso_gettimeofday(struct timeval *tv, struct timezone *tz);
+extern time_t __vdso_time(time_t *t);
+
+#ifndef BUILD_VDSO32
+
static notrace cycle_t vread_hpet(void)
{
return readl((const void __iomem *)fix_to_virt(VSYSCALL_HPET) + HPET_COUNTER);
@@ -118,6 +125,59 @@ static notrace cycle_t vread_pvclock(int *mode)
}
#endif

+#else
+
+extern u8 hpet_page
+ __attribute__((visibility("hidden")));
+
+#ifdef CONFIG_HPET_TIMER
+static notrace cycle_t vread_hpet(void)
+{
+ return readl((const void __iomem *)(&hpet_page + HPET_COUNTER));
+}
+#endif
+
+notrace static long vdso_fallback_gettime(long clock, struct timespec *ts)
+{
+ long ret;
+
+ asm(
+ "mov %%ebx, %%edx \n"
+ "mov %2, %%ebx \n"
+ "call VDSO32_vsyscall \n"
+ "mov %%edx, %%ebx \n"
+ : "=a" (ret)
+ : "0" (__NR_clock_gettime), "g" (clock), "c" (ts)
+ : "memory", "edx");
+ return ret;
+}
+
+notrace static long vdso_fallback_gtod(struct timeval *tv, struct timezone *tz)
+{
+ long ret;
+
+ asm(
+ "mov %%ebx, %%edx \n"
+ "mov %2, %%ebx \n"
+ "call VDSO32_vsyscall \n"
+ "mov %%edx, %%ebx \n"
+ : "=a" (ret)
+ : "0" (__NR_gettimeofday), "g" (tv), "c" (tz)
+ : "memory", "edx");
+ return ret;
+}
+
+#ifdef CONFIG_PARAVIRT_CLOCK
+
+static notrace cycle_t vread_pvclock(int *mode)
+{
+ *mode = VCLOCK_NONE;
+ return 0;
+}
+#endif
+
+#endif
+
notrace static cycle_t vread_tsc(void)
{
cycle_t ret;
@@ -131,7 +191,7 @@ notrace static cycle_t vread_tsc(void)
* but no one has ever seen it happen.
*/
rdtsc_barrier();
- ret = (cycle_t)vget_cycles();
+ ret = (cycle_t)__native_read_tsc();

last = gtod->clock.cycle_last;

@@ -152,12 +212,14 @@ notrace static cycle_t vread_tsc(void)

notrace static inline u64 vgetsns(int *mode)
{
- long v;
+ u64 v;
cycles_t cycles;
if (gtod->clock.vclock_mode == VCLOCK_TSC)
cycles = vread_tsc();
+#ifdef CONFIG_HPET_TIMER
else if (gtod->clock.vclock_mode == VCLOCK_HPET)
cycles = vread_hpet();
+#endif
#ifdef CONFIG_PARAVIRT_CLOCK
else if (gtod->clock.vclock_mode == VCLOCK_PVCLOCK)
cycles = vread_pvclock(mode);
@@ -284,7 +346,7 @@ int gettimeofday(struct timeval *, struct timezone *)
*/
notrace time_t __vdso_time(time_t *t)
{
- /* This is atomic on x86_64 so we don't need any locks. */
+ /* This is atomic on x86 so we don't need any locks. */
time_t result = ACCESS_ONCE(gtod->wall_time_sec);

if (t)
diff --git a/arch/x86/vdso/vdso-layout.lds.S b/arch/x86/vdso/vdso-layout.lds.S
index 634a2cf..1261437 100644
--- a/arch/x86/vdso/vdso-layout.lds.S
+++ b/arch/x86/vdso/vdso-layout.lds.S
@@ -6,6 +6,24 @@

SECTIONS
{
+#ifdef BUILD_VDSO32
+#include <asm/vdso32.h>
+
+ .hpet_sect : {
+ hpet_page = . - VDSO_OFFSET(VDSO_HPET_PAGE);
+ } :text :hpet_sect
+
+ .vvar_sect : {
+ vvar = . - VDSO_OFFSET(VDSO_VVAR_PAGE);
+
+ /* Place all vvars at the offsets in asm/vvar.h. */
+#define EMIT_VVAR(name, offset) vvar_ ## name = vvar + offset;
+#define __VVAR_KERNEL_LDS
+#include <asm/vvar.h>
+#undef __VVAR_KERNEL_LDS
+#undef EMIT_VVAR
+ } :text :vvar_sect
+#endif
. = VDSO_PRELINK + SIZEOF_HEADERS;

.hash : { *(.hash) } :text
@@ -61,4 +79,8 @@ PHDRS
dynamic PT_DYNAMIC FLAGS(4); /* PF_R */
note PT_NOTE FLAGS(4); /* PF_R */
eh_frame_hdr PT_GNU_EH_FRAME;
+#ifdef BUILD_VDSO32
+ vvar_sect PT_NULL FLAGS(4); /* PF_R */
+ hpet_sect PT_NULL FLAGS(4); /* PF_R */
+#endif
}
diff --git a/arch/x86/vdso/vdso32-setup.c b/arch/x86/vdso/vdso32-setup.c
index d6bfb87..9b57770 100644
--- a/arch/x86/vdso/vdso32-setup.c
+++ b/arch/x86/vdso/vdso32-setup.c
@@ -25,6 +25,9 @@
#include <asm/tlbflush.h>
#include <asm/vdso.h>
#include <asm/proto.h>
+#include <asm/fixmap.h>
+#include <asm/hpet.h>
+#include <asm/vvar.h>

enum {
VDSO_DISABLED = 0,
@@ -193,7 +196,7 @@ static __init void relocate_vdso(Elf32_Ehdr *ehdr)
}
}

-static struct page *vdso32_pages[1];
+static struct page *vdso32_pages[VDSO_PAGES];

#ifdef CONFIG_X86_64

@@ -310,6 +313,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
unsigned long addr;
int ret = 0;
bool compat;
+ struct vm_area_struct *vma;

#ifdef CONFIG_X86_X32_ABI
if (test_thread_flag(TIF_X32))
@@ -330,11 +334,13 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
if (compat)
addr = VDSO_HIGH_BASE;
else {
- addr = get_unmapped_area(NULL, 0, PAGE_SIZE, 0, 0);
+ addr = get_unmapped_area(NULL, 0, VDSO_OFFSET(VDSO_PAGES), 0, 0);
if (IS_ERR_VALUE(addr)) {
ret = addr;
goto up_fail;
}
+
+ addr += VDSO_OFFSET(VDSO_PREV_PAGES);
}

current->mm->context.vdso = (void *)addr;
@@ -343,13 +349,48 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
/*
* MAYWRITE to allow gdb to COW and set breakpoints
*/
- ret = install_special_mapping(mm, addr, PAGE_SIZE,
- VM_READ|VM_EXEC|
- VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
- vdso32_pages);
+ ret = install_special_mapping(mm,
+ addr,
+ VDSO_OFFSET(VDSO_PAGES - VDSO_PREV_PAGES),
+ VM_READ|VM_EXEC|
+ VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
+ vdso32_pages);

if (ret)
goto up_fail;
+
+ vma = _install_special_mapping(mm,
+ addr - VDSO_OFFSET(VDSO_PREV_PAGES),
+ VDSO_OFFSET(VDSO_PREV_PAGES),
+ VM_READ,
+ NULL);
+
+ if (IS_ERR(vma)) {
+ ret = PTR_ERR(vma);
+ goto up_fail;
+ }
+
+ ret = remap_pfn_range(vma,
+ addr - VDSO_OFFSET(VDSO_VVAR_PAGE),
+ __pa_symbol(&__vvar_page) >> PAGE_SHIFT,
+ PAGE_SIZE,
+ PAGE_READONLY);
+
+ if (ret)
+ goto up_fail;
+
+#ifdef CONFIG_HPET_TIMER
+ if (hpet_address) {
+ ret = io_remap_pfn_range(vma,
+ addr - VDSO_OFFSET(VDSO_HPET_PAGE),
+ hpet_address >> PAGE_SHIFT,
+ PAGE_SIZE,
+ pgprot_noncached(PAGE_READONLY));
+
+ if (ret)
+ goto up_fail;
+ }
+#endif
}

current_thread_info()->sysenter_return =
diff --git a/arch/x86/vdso/vdso32/vclock_gettime.c b/arch/x86/vdso/vdso32/vclock_gettime.c
new file mode 100644
index 0000000..1034dea
--- /dev/null
+++ b/arch/x86/vdso/vdso32/vclock_gettime.c
@@ -0,0 +1,35 @@
+#define BUILD_VDSO32
+
+#ifdef CONFIG_X86_64
+
+/*
+ * Due the -m32 compilation, there will be a lot of
+ * "warning: integer constant is too large for 'unsigned long' type",
+ * because an unsigned long is only 32 bit.
+ */
+
+/*
+ * Prevents the include of arch/x86/include/asm/page.h, which will generate
+ * a lot of warnings.
+ */
+#define _ASM_X86_PAGE_H
+
+/*
+ * The unneeded inline function phys_to_virt() in arch/x86/include/asm/io.h
+ * depends on the __va(), which comes from arch/x86/include/asm/page.h.
+ * So add a dummy for this.
+ *
+ * It is save, since this functions not used in arch/x86/vdso/vclock_gettime.c
+ */
+#define __va(x) 0
+
+/*
+ * The define of CONFIG_ILLEGAL_POINTER_VALUE is also to prevent the
+ * "warning: integer constant is too large..."
+ */
+#undef CONFIG_ILLEGAL_POINTER_VALUE
+#define CONFIG_ILLEGAL_POINTER_VALUE 0
+
+#endif
+
+#include "../vclock_gettime.c"
diff --git a/arch/x86/vdso/vdso32/vdso32.lds.S b/arch/x86/vdso/vdso32/vdso32.lds.S
index 976124b..bc8bf6d 100644
--- a/arch/x86/vdso/vdso32/vdso32.lds.S
+++ b/arch/x86/vdso/vdso32/vdso32.lds.S
@@ -8,6 +8,9 @@
* values visible using the asm-x86/vdso.h macros from the kernel proper.
*/

+#include <asm/page.h>
+
+#define BUILD_VDSO32
#define VDSO_PRELINK 0
#include "../vdso-layout.lds.S"

@@ -24,6 +27,9 @@ VERSION
__kernel_vsyscall;
__kernel_sigreturn;
__kernel_rt_sigreturn;
+ __vdso_clock_gettime;
+ __vdso_gettimeofday;
+ __vdso_time;
local: *;
};
}
@@ -35,3 +41,6 @@ VDSO32_PRELINK = VDSO_PRELINK;
VDSO32_vsyscall = __kernel_vsyscall;
VDSO32_sigreturn = __kernel_sigreturn;
VDSO32_rt_sigreturn = __kernel_rt_sigreturn;
+VDSO32_clock_gettime = clock_gettime;
+VDSO32_gettimeofday = gettimeofday;
+VDSO32_time = time;

Subject: [tip:x86/vdso] x86, vdso: Add 32-bit VDSO time support for the 64-bit kernel

Commit-ID: 249adfe2c86766eaa739d342525e55a96bf9efa7
Gitweb: http://git.kernel.org/tip/249adfe2c86766eaa739d342525e55a96bf9efa7
Author: Stefani Seibold <[email protected]>
AuthorDate: Sun, 16 Feb 2014 22:52:47 +0100
Committer: H. Peter Anvin <[email protected]>
CommitDate: Sun, 16 Feb 2014 15:08:29 -0800

x86, vdso: Add 32-bit VDSO time support for the 64-bit kernel

This patch add the VDSO time support for the IA32 Emulation Layer.

Due the nature of the kernel headers and the LP64 compiler where the
size of a long and a pointer differs against a 32 bit compiler, there
is some type hacking necessary for optimal performance.

The vsyscall_gtod_data struture must be a rearranged to serve 32- and
64-bit code access at the same time:

- The seqcount_t was replaced by an unsigned, this makes the
vsyscall_gtod_data intedepend of kernel configuration and internal functions.
- All kernel internal structures are replaced by fix size elements
which works for 32- and 64-bit access
- The inner struct clock was removed to pack the whole struct.

The "unsigned seq" would be handled by functions derivated from seqcount_t.

Signed-off-by: Stefani Seibold <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: H. Peter Anvin <[email protected]>
---
arch/x86/include/asm/vgtod.h | 69 ++++++++++++++++++++++++++++-------
arch/x86/include/asm/vvar.h | 5 +++
arch/x86/kernel/vsyscall_gtod.c | 34 +++++++++++------
arch/x86/vdso/vclock_gettime.c | 68 +++++++++++++++++-----------------
arch/x86/vdso/vdso32/vclock_gettime.c | 33 +++++++++++++++++
5 files changed, 149 insertions(+), 60 deletions(-)

diff --git a/arch/x86/include/asm/vgtod.h b/arch/x86/include/asm/vgtod.h
index 46e24d3..abb9e45 100644
--- a/arch/x86/include/asm/vgtod.h
+++ b/arch/x86/include/asm/vgtod.h
@@ -4,27 +4,70 @@
#include <asm/vsyscall.h>
#include <linux/clocksource.h>

+#ifdef CONFIG_X86_64
+typedef u64 gtod_long_t;
+#else
+typedef u32 gtod_long_t;
+#endif
+/*
+ * vsyscall_gtod_data will be accessed by 32 and 64 bit code at the same time
+ * so be carefull by modifying this structure.
+ */
struct vsyscall_gtod_data {
- seqcount_t seq;
+ unsigned seq;

- struct { /* extract of a clocksource struct */
- int vclock_mode;
- cycle_t cycle_last;
- cycle_t mask;
- u32 mult;
- u32 shift;
- } clock;
+ int vclock_mode;
+ cycle_t cycle_last;
+ cycle_t mask;
+ u32 mult;
+ u32 shift;

/* open coded 'struct timespec' */
- time_t wall_time_sec;
u64 wall_time_snsec;
+ gtod_long_t wall_time_sec;
+ gtod_long_t monotonic_time_sec;
u64 monotonic_time_snsec;
- time_t monotonic_time_sec;
+ gtod_long_t wall_time_coarse_sec;
+ gtod_long_t wall_time_coarse_nsec;
+ gtod_long_t monotonic_time_coarse_sec;
+ gtod_long_t monotonic_time_coarse_nsec;

- struct timezone sys_tz;
- struct timespec wall_time_coarse;
- struct timespec monotonic_time_coarse;
+ int tz_minuteswest;
+ int tz_dsttime;
};
extern struct vsyscall_gtod_data vsyscall_gtod_data;

+static inline unsigned gtod_read_begin(const struct vsyscall_gtod_data *s)
+{
+ unsigned ret;
+
+repeat:
+ ret = ACCESS_ONCE(s->seq);
+ if (unlikely(ret & 1)) {
+ cpu_relax();
+ goto repeat;
+ }
+ smp_rmb();
+ return ret;
+}
+
+static inline int gtod_read_retry(const struct vsyscall_gtod_data *s,
+ unsigned start)
+{
+ smp_rmb();
+ return unlikely(s->seq != start);
+}
+
+static inline void gtod_write_begin(struct vsyscall_gtod_data *s)
+{
+ ++s->seq;
+ smp_wmb();
+}
+
+static inline void gtod_write_end(struct vsyscall_gtod_data *s)
+{
+ smp_wmb();
+ ++s->seq;
+}
+
#endif /* _ASM_X86_VGTOD_H */
diff --git a/arch/x86/include/asm/vvar.h b/arch/x86/include/asm/vvar.h
index 52c79ff..081d909 100644
--- a/arch/x86/include/asm/vvar.h
+++ b/arch/x86/include/asm/vvar.h
@@ -16,6 +16,9 @@
* you mess up, the linker will catch it.)
*/

+#ifndef _ASM_X86_VVAR_H
+#define _ASM_X86_VVAR_H
+
#if defined(__VVAR_KERNEL_LDS)

/* The kernel linker script defines its own magic to put vvars in the
@@ -64,3 +67,5 @@ DECLARE_VVAR(16, int, vgetcpu_mode)
DECLARE_VVAR(128, struct vsyscall_gtod_data, vsyscall_gtod_data)

#undef DECLARE_VVAR
+
+#endif
diff --git a/arch/x86/kernel/vsyscall_gtod.c b/arch/x86/kernel/vsyscall_gtod.c
index b5a943d..973dcc4 100644
--- a/arch/x86/kernel/vsyscall_gtod.c
+++ b/arch/x86/kernel/vsyscall_gtod.c
@@ -4,6 +4,7 @@
*
* Modified for x86 32 bit architecture by
* Stefani Seibold <[email protected]>
+ * sponsored by Rohde & Schwarz GmbH & Co. KG Munich/Germany
*
* Thanks to [email protected] for some useful hint.
* Special thanks to Ingo Molnar for his early experience with
@@ -18,21 +19,22 @@ DEFINE_VVAR(struct vsyscall_gtod_data, vsyscall_gtod_data);

void update_vsyscall_tz(void)
{
- vsyscall_gtod_data.sys_tz = sys_tz;
+ vsyscall_gtod_data.tz_minuteswest = sys_tz.tz_minuteswest;
+ vsyscall_gtod_data.tz_dsttime = sys_tz.tz_dsttime;
}

void update_vsyscall(struct timekeeper *tk)
{
struct vsyscall_gtod_data *vdata = &vsyscall_gtod_data;

- write_seqcount_begin(&vdata->seq);
+ gtod_write_begin(vdata);

/* copy vsyscall data */
- vdata->clock.vclock_mode = tk->clock->archdata.vclock_mode;
- vdata->clock.cycle_last = tk->clock->cycle_last;
- vdata->clock.mask = tk->clock->mask;
- vdata->clock.mult = tk->mult;
- vdata->clock.shift = tk->shift;
+ vdata->vclock_mode = tk->clock->archdata.vclock_mode;
+ vdata->cycle_last = tk->clock->cycle_last;
+ vdata->mask = tk->clock->mask;
+ vdata->mult = tk->mult;
+ vdata->shift = tk->shift;

vdata->wall_time_sec = tk->xtime_sec;
vdata->wall_time_snsec = tk->xtime_nsec;
@@ -49,11 +51,19 @@ void update_vsyscall(struct timekeeper *tk)
vdata->monotonic_time_sec++;
}

- vdata->wall_time_coarse.tv_sec = tk->xtime_sec;
- vdata->wall_time_coarse.tv_nsec = (long)(tk->xtime_nsec >> tk->shift);
+ vdata->wall_time_coarse_sec = tk->xtime_sec;
+ vdata->wall_time_coarse_nsec = (long)(tk->xtime_nsec >> tk->shift);

- vdata->monotonic_time_coarse = timespec_add(vdata->wall_time_coarse,
- tk->wall_to_monotonic);
+ vdata->monotonic_time_coarse_sec =
+ vdata->wall_time_coarse_sec + tk->wall_to_monotonic.tv_sec;
+ vdata->monotonic_time_coarse_nsec =
+ vdata->wall_time_coarse_nsec + tk->wall_to_monotonic.tv_nsec;

- write_seqcount_end(&vdata->seq);
+ while (vdata->monotonic_time_coarse_nsec >= NSEC_PER_SEC) {
+ vdata->monotonic_time_coarse_nsec -= NSEC_PER_SEC;
+ vdata->monotonic_time_coarse_sec++;
+ }
+
+ gtod_write_end(vdata);
}
+
diff --git a/arch/x86/vdso/vclock_gettime.c b/arch/x86/vdso/vclock_gettime.c
index fcbc974..b2c5d39 100644
--- a/arch/x86/vdso/vclock_gettime.c
+++ b/arch/x86/vdso/vclock_gettime.c
@@ -116,7 +116,7 @@ static notrace cycle_t vread_pvclock(int *mode)
*mode = VCLOCK_NONE;

/* refer to tsc.c read_tsc() comment for rationale */
- last = gtod->clock.cycle_last;
+ last = gtod->cycle_last;

if (likely(ret >= last))
return ret;
@@ -147,7 +147,7 @@ notrace static long vdso_fallback_gettime(long clock, struct timespec *ts)
"call VDSO32_vsyscall \n"
"mov %%edx, %%ebx \n"
: "=a" (ret)
- : "0" (__NR_clock_gettime), "g" (clock), "c" (ts)
+ : "0" (__NR_ia32_clock_gettime), "g" (clock), "c" (ts)
: "memory", "edx");
return ret;
}
@@ -162,7 +162,7 @@ notrace static long vdso_fallback_gtod(struct timeval *tv, struct timezone *tz)
"call VDSO32_vsyscall \n"
"mov %%edx, %%ebx \n"
: "=a" (ret)
- : "0" (__NR_gettimeofday), "g" (tv), "c" (tz)
+ : "0" (__NR_ia32_gettimeofday), "g" (tv), "c" (tz)
: "memory", "edx");
return ret;
}
@@ -193,7 +193,7 @@ notrace static cycle_t vread_tsc(void)
rdtsc_barrier();
ret = (cycle_t)__native_read_tsc();

- last = gtod->clock.cycle_last;
+ last = gtod->cycle_last;

if (likely(ret >= last))
return ret;
@@ -214,20 +214,20 @@ notrace static inline u64 vgetsns(int *mode)
{
u64 v;
cycles_t cycles;
- if (gtod->clock.vclock_mode == VCLOCK_TSC)
+ if (gtod->vclock_mode == VCLOCK_TSC)
cycles = vread_tsc();
#ifdef CONFIG_HPET_TIMER
- else if (gtod->clock.vclock_mode == VCLOCK_HPET)
+ else if (gtod->vclock_mode == VCLOCK_HPET)
cycles = vread_hpet();
#endif
#ifdef CONFIG_PARAVIRT_CLOCK
- else if (gtod->clock.vclock_mode == VCLOCK_PVCLOCK)
+ else if (gtod->vclock_mode == VCLOCK_PVCLOCK)
cycles = vread_pvclock(mode);
#endif
else
return 0;
- v = (cycles - gtod->clock.cycle_last) & gtod->clock.mask;
- return v * gtod->clock.mult;
+ v = (cycles - gtod->cycle_last) & gtod->mask;
+ return v * gtod->mult;
}

/* Code size doesn't matter (vdso is 4k anyway) and this is faster. */
@@ -237,17 +237,18 @@ notrace static int __always_inline do_realtime(struct timespec *ts)
u64 ns;
int mode;

- ts->tv_nsec = 0;
do {
- seq = raw_read_seqcount_begin(&gtod->seq);
- mode = gtod->clock.vclock_mode;
+ seq = gtod_read_begin(gtod);
+ mode = gtod->vclock_mode;
ts->tv_sec = gtod->wall_time_sec;
ns = gtod->wall_time_snsec;
ns += vgetsns(&mode);
- ns >>= gtod->clock.shift;
- } while (unlikely(read_seqcount_retry(&gtod->seq, seq)));
+ ns >>= gtod->shift;
+ } while (unlikely(gtod_read_retry(gtod, seq)));
+
+ ts->tv_sec += __iter_div_u64_rem(ns, NSEC_PER_SEC, &ns);
+ ts->tv_nsec = ns;

- timespec_add_ns(ts, ns);
return mode;
}

@@ -257,16 +258,17 @@ notrace static int do_monotonic(struct timespec *ts)
u64 ns;
int mode;

- ts->tv_nsec = 0;
do {
- seq = raw_read_seqcount_begin(&gtod->seq);
- mode = gtod->clock.vclock_mode;
+ seq = gtod_read_begin(gtod);
+ mode = gtod->vclock_mode;
ts->tv_sec = gtod->monotonic_time_sec;
ns = gtod->monotonic_time_snsec;
ns += vgetsns(&mode);
- ns >>= gtod->clock.shift;
- } while (unlikely(read_seqcount_retry(&gtod->seq, seq)));
- timespec_add_ns(ts, ns);
+ ns >>= gtod->shift;
+ } while (unlikely(gtod_read_retry(gtod, seq)));
+
+ ts->tv_sec += __iter_div_u64_rem(ns, NSEC_PER_SEC, &ns);
+ ts->tv_nsec = ns;

return mode;
}
@@ -275,20 +277,20 @@ notrace static void do_realtime_coarse(struct timespec *ts)
{
unsigned long seq;
do {
- seq = raw_read_seqcount_begin(&gtod->seq);
- ts->tv_sec = gtod->wall_time_coarse.tv_sec;
- ts->tv_nsec = gtod->wall_time_coarse.tv_nsec;
- } while (unlikely(read_seqcount_retry(&gtod->seq, seq)));
+ seq = gtod_read_begin(gtod);
+ ts->tv_sec = gtod->wall_time_coarse_sec;
+ ts->tv_nsec = gtod->wall_time_coarse_nsec;
+ } while (unlikely(gtod_read_retry(gtod, seq)));
}

notrace static void do_monotonic_coarse(struct timespec *ts)
{
unsigned long seq;
do {
- seq = raw_read_seqcount_begin(&gtod->seq);
- ts->tv_sec = gtod->monotonic_time_coarse.tv_sec;
- ts->tv_nsec = gtod->monotonic_time_coarse.tv_nsec;
- } while (unlikely(read_seqcount_retry(&gtod->seq, seq)));
+ seq = gtod_read_begin(gtod);
+ ts->tv_sec = gtod->monotonic_time_coarse_sec;
+ ts->tv_nsec = gtod->monotonic_time_coarse_nsec;
+ } while (unlikely(gtod_read_retry(gtod, seq)));
}

notrace int __vdso_clock_gettime(clockid_t clock, struct timespec *ts)
@@ -322,17 +324,13 @@ int clock_gettime(clockid_t, struct timespec *)
notrace int __vdso_gettimeofday(struct timeval *tv, struct timezone *tz)
{
if (likely(tv != NULL)) {
- BUILD_BUG_ON(offsetof(struct timeval, tv_usec) !=
- offsetof(struct timespec, tv_nsec) ||
- sizeof(*tv) != sizeof(struct timespec));
if (unlikely(do_realtime((struct timespec *)tv) == VCLOCK_NONE))
return vdso_fallback_gtod(tv, tz);
tv->tv_usec /= 1000;
}
if (unlikely(tz != NULL)) {
- /* Avoid memcpy. Some old compilers fail to inline it */
- tz->tz_minuteswest = gtod->sys_tz.tz_minuteswest;
- tz->tz_dsttime = gtod->sys_tz.tz_dsttime;
+ tz->tz_minuteswest = gtod->tz_minuteswest;
+ tz->tz_dsttime = gtod->tz_dsttime;
}

return 0;
diff --git a/arch/x86/vdso/vdso32/vclock_gettime.c b/arch/x86/vdso/vdso32/vclock_gettime.c
index 1034dea..5de5057 100644
--- a/arch/x86/vdso/vdso32/vclock_gettime.c
+++ b/arch/x86/vdso/vdso32/vclock_gettime.c
@@ -2,6 +2,8 @@

#ifdef CONFIG_X86_64

+#include <asm/unistd_32_ia32.h>
+
/*
* Due the -m32 compilation, there will be a lot of
* "warning: integer constant is too large for 'unsigned long' type",
@@ -24,12 +26,43 @@
#define __va(x) 0

/*
+ * function load_cr3() in x86/include/asm/processor.h depends on __pa().
+ * Replace by a dummy is save since not used
+ */
+#define __pa(x) 0
+
+/*
+ * Prevents the include of arch/x86/include/asm/spinlock.h, which will generate
+ * a lot of warnings with make C=1.
+ * It is imposible not to include spinlock.h since most kernel headers does
+ * include it.
+ */
+#define _ASM_X86_SPINLOCK_H
+
+/*
+ * dummys for unneeded arck_spin functions
+ */
+static inline void arch_spin_unlock_wait(void *p)
+{
+}
+
+static inline int arch_spin_is_locked(void *p)
+{
+ return 0;
+}
+
+/*
* The define of CONFIG_ILLEGAL_POINTER_VALUE is also to prevent the
* "warning: integer constant is too large..."
*/
#undef CONFIG_ILLEGAL_POINTER_VALUE
#define CONFIG_ILLEGAL_POINTER_VALUE 0

+#else
+
+#define __NR_ia32_clock_gettime __NR_clock_gettime
+#define __NR_ia32_gettimeofday __NR_gettimeofday
+
#endif

#include "../vclock_gettime.c"

Subject: [tip:x86/vdso] x86, vdso: Add a lot more dummy locking functions to vclock_gettime.c

Commit-ID: 4b5e4f908855a66f0957bf35137ce3ee37c05230
Gitweb: http://git.kernel.org/tip/4b5e4f908855a66f0957bf35137ce3ee37c05230
Author: H. Peter Anvin <[email protected]>
AuthorDate: Sun, 16 Feb 2014 16:44:25 -0800
Committer: H. Peter Anvin <[email protected]>
CommitDate: Sun, 16 Feb 2014 16:44:25 -0800

x86, vdso: Add a lot more dummy locking functions to vclock_gettime.c

We need a lot more dummy locking functions to build in all possible
configurations.

Cc: Stefani Seibold <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: H. Peter Anvin <[email protected]>
---
arch/x86/vdso/vdso32/vclock_gettime.c | 58 +++++++++++++++++++++++++++++++++--
1 file changed, 55 insertions(+), 3 deletions(-)

diff --git a/arch/x86/vdso/vdso32/vclock_gettime.c b/arch/x86/vdso/vdso32/vclock_gettime.c
index 5de5057..56465ad 100644
--- a/arch/x86/vdso/vdso32/vclock_gettime.c
+++ b/arch/x86/vdso/vdso32/vclock_gettime.c
@@ -40,17 +40,69 @@
#define _ASM_X86_SPINLOCK_H

/*
- * dummys for unneeded arck_spin functions
+ * dummys for unneeded arch_spin functions
*/
-static inline void arch_spin_unlock_wait(void *p)
+static inline void arch_spin_lock(void *lock)
{
}

-static inline int arch_spin_is_locked(void *p)
+static inline int arch_spin_trylock(void *lock)
{
return 0;
}

+static inline void arch_spin_unlock(void *lock)
+{
+}
+
+static inline void arch_spin_lock_flags(void *lock,
+ unsigned long flags)
+{
+}
+
+static inline void arch_spin_unlock_wait(void *lock)
+{
+}
+
+static inline int arch_spin_is_locked(void *lock)
+{
+ return 0;
+}
+
+static inline int arch_read_trylock(void *lock)
+{
+ return 0;
+}
+
+static inline int arch_write_trylock(void *lock)
+{
+ return 0;
+}
+
+static inline void arch_read_lock(void *rw)
+{
+}
+
+static inline void arch_write_lock(void *rw)
+{
+}
+
+static inline void arch_read_lock_flags(void *rw, unsigned long flags)
+{
+}
+
+static inline void arch_write_lock_flags(void *rw, unsigned long flags)
+{
+}
+
+static inline void arch_read_unlock(void *rw)
+{
+}
+
+static inline void arch_write_unlock(void *rw)
+{
+}
+
/*
* The define of CONFIG_ILLEGAL_POINTER_VALUE is also to prevent the
* "warning: integer constant is too large..."

Subject: [tip:x86/vdso] x86, vdso: Revamp vclock_gettime.c

Commit-ID: 0b20a1f58d3502a8dfec98a8926f26c43429bee7
Gitweb: http://git.kernel.org/tip/0b20a1f58d3502a8dfec98a8926f26c43429bee7
Author: Stefani Seibold <[email protected]>
AuthorDate: Sun, 16 Feb 2014 22:52:41 +0100
Committer: H. Peter Anvin <[email protected]>
CommitDate: Sun, 16 Feb 2014 15:06:39 -0800

x86, vdso: Revamp vclock_gettime.c

This intermediate patch revamps the vclock_gettime.c by moving some
functions around. This is only code movement, to make the whole
32-bit vdso timer patchset easier to review.

Signed-off-by: Stefani Seibold <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: H. Peter Anvin <[email protected]>
---
arch/x86/vdso/vclock_gettime.c | 85 +++++++++++++++++++++---------------------
1 file changed, 42 insertions(+), 43 deletions(-)

diff --git a/arch/x86/vdso/vclock_gettime.c b/arch/x86/vdso/vclock_gettime.c
index eb5d7a5..bbc8065 100644
--- a/arch/x86/vdso/vclock_gettime.c
+++ b/arch/x86/vdso/vclock_gettime.c
@@ -26,41 +26,26 @@

#define gtod (&VVAR(vsyscall_gtod_data))

-notrace static cycle_t vread_tsc(void)
+static notrace cycle_t vread_hpet(void)
{
- cycle_t ret;
- u64 last;
-
- /*
- * Empirically, a fence (of type that depends on the CPU)
- * before rdtsc is enough to ensure that rdtsc is ordered
- * with respect to loads. The various CPU manuals are unclear
- * as to whether rdtsc can be reordered with later loads,
- * but no one has ever seen it happen.
- */
- rdtsc_barrier();
- ret = (cycle_t)vget_cycles();
-
- last = VVAR(vsyscall_gtod_data).clock.cycle_last;
-
- if (likely(ret >= last))
- return ret;
+ return readl((const void __iomem *)fix_to_virt(VSYSCALL_HPET) + HPET_COUNTER);
+}

- /*
- * GCC likes to generate cmov here, but this branch is extremely
- * predictable (it's just a funciton of time and the likely is
- * very likely) and there's a data dependence, so force GCC
- * to generate a branch instead. I don't barrier() because
- * we don't actually need a barrier, and if this function
- * ever gets inlined it will generate worse code.
- */
- asm volatile ("");
- return last;
+notrace static long vdso_fallback_gettime(long clock, struct timespec *ts)
+{
+ long ret;
+ asm("syscall" : "=a" (ret) :
+ "0" (__NR_clock_gettime), "D" (clock), "S" (ts) : "memory");
+ return ret;
}

-static notrace cycle_t vread_hpet(void)
+notrace static long vdso_fallback_gtod(struct timeval *tv, struct timezone *tz)
{
- return readl((const void __iomem *)fix_to_virt(VSYSCALL_HPET) + HPET_COUNTER);
+ long ret;
+
+ asm("syscall" : "=a" (ret) :
+ "0" (__NR_gettimeofday), "D" (tv), "S" (tz) : "memory");
+ return ret;
}

#ifdef CONFIG_PARAVIRT_CLOCK
@@ -133,23 +118,37 @@ static notrace cycle_t vread_pvclock(int *mode)
}
#endif

-notrace static long vdso_fallback_gettime(long clock, struct timespec *ts)
+notrace static cycle_t vread_tsc(void)
{
- long ret;
- asm("syscall" : "=a" (ret) :
- "0" (__NR_clock_gettime),"D" (clock), "S" (ts) : "memory");
- return ret;
-}
+ cycle_t ret;
+ u64 last;

-notrace static long vdso_fallback_gtod(struct timeval *tv, struct timezone *tz)
-{
- long ret;
+ /*
+ * Empirically, a fence (of type that depends on the CPU)
+ * before rdtsc is enough to ensure that rdtsc is ordered
+ * with respect to loads. The various CPU manuals are unclear
+ * as to whether rdtsc can be reordered with later loads,
+ * but no one has ever seen it happen.
+ */
+ rdtsc_barrier();
+ ret = (cycle_t)vget_cycles();

- asm("syscall" : "=a" (ret) :
- "0" (__NR_gettimeofday), "D" (tv), "S" (tz) : "memory");
- return ret;
-}
+ last = VVAR(vsyscall_gtod_data).clock.cycle_last;

+ if (likely(ret >= last))
+ return ret;
+
+ /*
+ * GCC likes to generate cmov here, but this branch is extremely
+ * predictable (it's just a funciton of time and the likely is
+ * very likely) and there's a data dependence, so force GCC
+ * to generate a branch instead. I don't barrier() because
+ * we don't actually need a barrier, and if this function
+ * ever gets inlined it will generate worse code.
+ */
+ asm volatile ("");
+ return last;
+}

notrace static inline u64 vgetsns(int *mode)
{

2014-02-17 01:48:13

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [tip:x86/vdso] x86, vdso: Add a lot more dummy locking functions to vclock_gettime.c

On 02/16/2014 04:54 PM, tip-bot for H. Peter Anvin wrote:
> Commit-ID: 4b5e4f908855a66f0957bf35137ce3ee37c05230
> Gitweb: http://git.kernel.org/tip/4b5e4f908855a66f0957bf35137ce3ee37c05230
> Author: H. Peter Anvin <[email protected]>
> AuthorDate: Sun, 16 Feb 2014 16:44:25 -0800
> Committer: H. Peter Anvin <[email protected]>
> CommitDate: Sun, 16 Feb 2014 16:44:25 -0800
>
> x86, vdso: Add a lot more dummy locking functions to vclock_gettime.c
>
> We need a lot more dummy locking functions to build in all possible
> configurations.
>

Sigh... the testbot shows that this is even more complicated than this...

-hpa



Attachments:
Attached Message (84.38 kB)
Subject: [tip:x86/vdso] x86, vdso: Instead of dummy functions, include < linux/spinlock_up.h>

Commit-ID: bd9ee7fd99f127ee1306289415141d45792c97f3
Gitweb: http://git.kernel.org/tip/bd9ee7fd99f127ee1306289415141d45792c97f3
Author: H. Peter Anvin <[email protected]>
AuthorDate: Sun, 16 Feb 2014 19:47:01 -0800
Committer: H. Peter Anvin <[email protected]>
CommitDate: Sun, 16 Feb 2014 19:47:01 -0800

x86, vdso: Instead of dummy functions, include <linux/spinlock_up.h>

The list of dummy functions is insufficient. However, instead of
having a full list of dummy functions we can include
<linux/spinlock_up.h> which contains the (trivial) implementations
that we use on uniprocessor.

There aren't supposed to be any spinlocks at all in the VDSO, of
course.

Cc: Stefani Seibold <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: H. Peter Anvin <[email protected]>
---
arch/x86/vdso/vdso32/vclock_gettime.c | 16 ++++------------
1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/arch/x86/vdso/vdso32/vclock_gettime.c b/arch/x86/vdso/vdso32/vclock_gettime.c
index 5de5057..2335f26 100644
--- a/arch/x86/vdso/vdso32/vclock_gettime.c
+++ b/arch/x86/vdso/vdso32/vclock_gettime.c
@@ -36,20 +36,12 @@
* a lot of warnings with make C=1.
* It is imposible not to include spinlock.h since most kernel headers does
* include it.
+ *
+ * <linux/spinlock_up.h> includes the minimal functions which are used
+ * on UP; include it instead.
*/
#define _ASM_X86_SPINLOCK_H
-
-/*
- * dummys for unneeded arck_spin functions
- */
-static inline void arch_spin_unlock_wait(void *p)
-{
-}
-
-static inline int arch_spin_is_locked(void *p)
-{
- return 0;
-}
+#include <linux/spinlock_up.h>

/*
* The define of CONFIG_ILLEGAL_POINTER_VALUE is also to prevent the

2014-02-17 04:06:54

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [tip:x86/vdso] x86, vdso: Instead of dummy functions, include < linux/spinlock_up.h>

On 02/16/2014 07:51 PM, tip-bot for H. Peter Anvin wrote:
> Commit-ID: bd9ee7fd99f127ee1306289415141d45792c97f3
> Gitweb: http://git.kernel.org/tip/bd9ee7fd99f127ee1306289415141d45792c97f3
> Author: H. Peter Anvin <[email protected]>
> AuthorDate: Sun, 16 Feb 2014 19:47:01 -0800
> Committer: H. Peter Anvin <[email protected]>
> CommitDate: Sun, 16 Feb 2014 19:47:01 -0800
>
> x86, vdso: Instead of dummy functions, include <linux/spinlock_up.h>
>
> The list of dummy functions is insufficient. However, instead of
> having a full list of dummy functions we can include
> <linux/spinlock_up.h> which contains the (trivial) implementations
> that we use on uniprocessor.
>
> There aren't supposed to be any spinlocks at all in the VDSO, of
> course.
>

That didn't work either. I thought I was clever, but it didn't work at
all. Multiple build failures across numerous configurations. This is
turning into a total headache.

The "right" way to fix this is presumably to refactor a bunch of header
files so that the vdso code doesn't have to include a bunch of kernel
internal headers, but that is a lot of work.

-hpa

2014-02-17 07:41:21

by Stefani Seibold

[permalink] [raw]
Subject: Re: [tip:x86/vdso] x86, vdso: Instead of dummy functions, include < linux/spinlock_up.h>

Hi Peter,

Am Sonntag, den 16.02.2014, 20:06 -0800 schrieb H. Peter Anvin:
> On 02/16/2014 07:51 PM, tip-bot for H. Peter Anvin wrote:
> > Commit-ID: bd9ee7fd99f127ee1306289415141d45792c97f3
> > Gitweb: http://git.kernel.org/tip/bd9ee7fd99f127ee1306289415141d45792c97f3
> > Author: H. Peter Anvin <[email protected]>
> > AuthorDate: Sun, 16 Feb 2014 19:47:01 -0800
> > Committer: H. Peter Anvin <[email protected]>
> > CommitDate: Sun, 16 Feb 2014 19:47:01 -0800
> >
> > x86, vdso: Instead of dummy functions, include <linux/spinlock_up.h>
> >
> > The list of dummy functions is insufficient. However, instead of
> > having a full list of dummy functions we can include
> > <linux/spinlock_up.h> which contains the (trivial) implementations
> > that we use on uniprocessor.
> >
> > There aren't supposed to be any spinlocks at all in the VDSO, of
> > course.
> >
>
> That didn't work either. I thought I was clever, but it didn't work at
> all. Multiple build failures across numerous configurations. This is
> turning into a total headache.
>
> The "right" way to fix this is presumably to refactor a bunch of header
> files so that the vdso code doesn't have to include a bunch of kernel
> internal headers, but that is a lot of work.
>

I think for the first time it will be okay to kick out the
_ASM_X86_SPINLOCK_H hack and accept the C=1 warnings.

At next step it is necessary to make the whole BUILD_VDSO32 path in
vclock_gettime.c independent from the kernel headers, only uapi/ should
be included.

The use of cycle_t must be replaced with u64.

We need a own copy of __native_read_tsc(), __iter_div_u64_rem, smp_rmb()
and cpu_relax().

For the non BUILD_VDSO32 path we must only move the #includes inside
this #ifndef BUILD_VDSO32

- Stefani

2014-02-17 09:28:27

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [tip:x86/vdso] x86, vdso: Instead of dummy functions, include < linux/spinlock_up.h>

On 02/16/2014 11:42 PM, Stefani Seibold wrote:
> I think for the first time it will be okay to kick out the
> _ASM_X86_SPINLOCK_H hack and accept the C=1 warnings.
>
> At next step it is necessary to make the whole BUILD_VDSO32 path in
> vclock_gettime.c independent from the kernel headers, only uapi/ should
> be included.
>
> The use of cycle_t must be replaced with u64.
>
> We need a own copy of __native_read_tsc(), __iter_div_u64_rem, smp_rmb()
> and cpu_relax().

All of which are quite trivial.

> For the non BUILD_VDSO32 path we must only move the #includes inside
> this #ifndef BUILD_VDSO32

Sorry, didn't quite follow that.

-hpa

2014-02-17 09:45:05

by Stefani Seibold

[permalink] [raw]
Subject: Re: [tip:x86/vdso] x86, vdso: Instead of dummy functions, include < linux/spinlock_up.h>

Am Montag, den 17.02.2014, 01:27 -0800 schrieb H. Peter Anvin:
> On 02/16/2014 11:42 PM, Stefani Seibold wrote:
> > I think for the first time it will be okay to kick out the
> > _ASM_X86_SPINLOCK_H hack and accept the C=1 warnings.
> >
> > At next step it is necessary to make the whole BUILD_VDSO32 path in
> > vclock_gettime.c independent from the kernel headers, only uapi/ should
> > be included.
> >
> > The use of cycle_t must be replaced with u64.
> >
> > We need a own copy of __native_read_tsc(), __iter_div_u64_rem, smp_rmb()
> > and cpu_relax().
>
> All of which are quite trivial.
>
> > For the non BUILD_VDSO32 path we must only move the #includes inside
> > this #ifndef BUILD_VDSO32
>
> Sorry, didn't quite follow that.
>

The solution is quite simple: In case of a 32 bit VDSO for a 64 bit
kernel fake a 32 bit kernel configuration. Than everything is fine and
all kernel headers will compile without warnings or errors, also make
C=1 will give no complains.

The arch/x86/vdso/vdso32/vclock_gettime.c will now look like:

#define BUILD_VDSO32

#ifdef CONFIG_X86_64

/*
* in case of a 32 bit VDSO for a 64 bit kernel fake a 32 bit kernel
* configuration
*/
#undef CONFIG_64BIT
#undef CONFIG_X86_64
#undef CONFIG_ILLEGAL_POINTER_VALUE

#define CONFIG_X86_32 1
#define CONFIG_PAGE_OFFSET 0
#define CONFIG_ILLEGAL_POINTER_VALUE 0

#define BUILD_VDSO32_64

#endif

#include "../vclock_gettime.c"

and the following modifications for arch/x86/include/asm/vgtod.h:

#ifdef BUILD_VDSO32_64
typedef u64 gtod_long_t;
#else
typedef unsigned long gtod_long_t;
#endif

I tested it and i see no side effects. What do you think?

- Stefani

2014-02-17 09:50:32

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [tip:x86/vdso] x86, vdso: Instead of dummy functions, include < linux/spinlock_up.h>

On 02/17/2014 01:46 AM, Stefani Seibold wrote:
> Am Montag, den 17.02.2014, 01:27 -0800 schrieb H. Peter Anvin:
>> On 02/16/2014 11:42 PM, Stefani Seibold wrote:
>>> I think for the first time it will be okay to kick out the
>>> _ASM_X86_SPINLOCK_H hack and accept the C=1 warnings.
>>>
>>> At next step it is necessary to make the whole BUILD_VDSO32 path in
>>> vclock_gettime.c independent from the kernel headers, only uapi/ should
>>> be included.
>>>
>>> The use of cycle_t must be replaced with u64.
>>>
>>> We need a own copy of __native_read_tsc(), __iter_div_u64_rem, smp_rmb()
>>> and cpu_relax().
>>
>> All of which are quite trivial.
>>
>>> For the non BUILD_VDSO32 path we must only move the #includes inside
>>> this #ifndef BUILD_VDSO32
>>
>> Sorry, didn't quite follow that.
>>
>
> The solution is quite simple: In case of a 32 bit VDSO for a 64 bit
> kernel fake a 32 bit kernel configuration. Than everything is fine and
> all kernel headers will compile without warnings or errors, also make
> C=1 will give no complains.
>
> The arch/x86/vdso/vdso32/vclock_gettime.c will now look like:
>
> #define BUILD_VDSO32
>
> #ifdef CONFIG_X86_64
>
> /*
> * in case of a 32 bit VDSO for a 64 bit kernel fake a 32 bit kernel
> * configuration
> */
> #undef CONFIG_64BIT
> #undef CONFIG_X86_64
> #undef CONFIG_ILLEGAL_POINTER_VALUE
>
> #define CONFIG_X86_32 1
> #define CONFIG_PAGE_OFFSET 0
> #define CONFIG_ILLEGAL_POINTER_VALUE 0
>
> #define BUILD_VDSO32_64
>
> #endif
>
> #include "../vclock_gettime.c"
>
> and the following modifications for arch/x86/include/asm/vgtod.h:
>
> #ifdef BUILD_VDSO32_64
> typedef u64 gtod_long_t;
> #else
> typedef unsigned long gtod_long_t;
> #endif
>
> I tested it and i see no side effects. What do you think?
>

Clever. It is still a hack of course, and it would be better to getting
to the point where we don't include random kernel headers but only uapi
headers plus special headers sanitized specifically for the vdso, but
the above looks like a good intermediate hack.

-hpa

2014-02-17 10:06:20

by Stefani Seibold

[permalink] [raw]
Subject: Re: [tip:x86/vdso] x86, vdso: Instead of dummy functions, include < linux/spinlock_up.h>

Am Montag, den 17.02.2014, 01:50 -0800 schrieb H. Peter Anvin:
> On 02/17/2014 01:46 AM, Stefani Seibold wrote:
> > Am Montag, den 17.02.2014, 01:27 -0800 schrieb H. Peter Anvin:
> >> On 02/16/2014 11:42 PM, Stefani Seibold wrote:
> >>> I think for the first time it will be okay to kick out the
> >>> _ASM_X86_SPINLOCK_H hack and accept the C=1 warnings.
> >>>
> >>> At next step it is necessary to make the whole BUILD_VDSO32 path in
> >>> vclock_gettime.c independent from the kernel headers, only uapi/ should
> >>> be included.
> >>>
> >>> The use of cycle_t must be replaced with u64.
> >>>
> >>> We need a own copy of __native_read_tsc(), __iter_div_u64_rem, smp_rmb()
> >>> and cpu_relax().
> >>
> >> All of which are quite trivial.
> >>
> >>> For the non BUILD_VDSO32 path we must only move the #includes inside
> >>> this #ifndef BUILD_VDSO32
> >>
> >> Sorry, didn't quite follow that.
> >>
> >
> > The solution is quite simple: In case of a 32 bit VDSO for a 64 bit
> > kernel fake a 32 bit kernel configuration. Than everything is fine and
> > all kernel headers will compile without warnings or errors, also make
> > C=1 will give no complains.
> >
> > The arch/x86/vdso/vdso32/vclock_gettime.c will now look like:
> >
> > #define BUILD_VDSO32
> >
> > #ifdef CONFIG_X86_64
> >
> > /*
> > * in case of a 32 bit VDSO for a 64 bit kernel fake a 32 bit kernel
> > * configuration
> > */
> > #undef CONFIG_64BIT
> > #undef CONFIG_X86_64
> > #undef CONFIG_ILLEGAL_POINTER_VALUE
> >
> > #define CONFIG_X86_32 1
> > #define CONFIG_PAGE_OFFSET 0
> > #define CONFIG_ILLEGAL_POINTER_VALUE 0
> >
> > #define BUILD_VDSO32_64
> >
> > #endif
> >
> > #include "../vclock_gettime.c"
> >
> > and the following modifications for arch/x86/include/asm/vgtod.h:
> >
> > #ifdef BUILD_VDSO32_64
> > typedef u64 gtod_long_t;
> > #else
> > typedef unsigned long gtod_long_t;
> > #endif
> >
> > I tested it and i see no side effects. What do you think?
> >
>
> Clever. It is still a hack of course, and it would be better to getting
> to the point where we don't include random kernel headers but only uapi
> headers plus special headers sanitized specifically for the vdso, but
> the above looks like a good intermediate hack.
>

Yes, but i think this is nearly impossible, because there must be
modified a lot of headers and nobody can predict the side effects.

- Stefani

Subject: [tip:x86/vdso] mm: Clean up style in install_special_mapping()

Commit-ID: 3af7111e2066a641510c16a4e9e82dd81550115b
Gitweb: http://git.kernel.org/tip/3af7111e2066a641510c16a4e9e82dd81550115b
Author: H. Peter Anvin <[email protected]>
AuthorDate: Wed, 19 Feb 2014 20:46:57 -0800
Committer: H. Peter Anvin <[email protected]>
CommitDate: Wed, 19 Feb 2014 20:46:57 -0800

mm: Clean up style in install_special_mapping()

We can clean up the style in install_special_mapping(), and make it
use PTR_ERR_OR_ZERO().

Reported-by: kbuild test robot <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: H. Peter Anvin <[email protected]>
---
mm/mmap.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 81ba54f..6b78a77 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2959,12 +2959,10 @@ int install_special_mapping(struct mm_struct *mm,
unsigned long addr, unsigned long len,
unsigned long vm_flags, struct page **pages)
{
- struct vm_area_struct *vma = _install_special_mapping(mm,
- addr, len, vm_flags, pages);
+ struct vm_area_struct *vma;

- if (IS_ERR(vma))
- return PTR_ERR(vma);
- return 0;
+ vma = _install_special_mapping(mm, addr, len, vm_flags, pages);
+ return PTR_ERR_OR_ZERO(vma);
}

static DEFINE_MUTEX(mm_all_locks_mutex);