2019-11-12 01:32:59

by Dmitry Safonov

[permalink] [raw]
Subject: [PATCHv8 00/34] kernel: Introduce Time Namespace

Discussions around time namespace are there for a long time. The first
attempt to implement it was in 2006 by Jeff Dike. From that time, the
topic appears on and off in various discussions.

There are two main use cases for time namespaces:
1. change date and time inside a container;
2. adjust clocks for a container restored from a checkpoint.

“It seems like this might be one of the last major obstacles keeping
migration from being used in production systems, given that not all
containers and connections can be migrated as long as a time dependency
is capable of messing it up.” (by github.com/dav-ell)

The kernel provides access to several clocks: CLOCK_REALTIME,
CLOCK_MONOTONIC, CLOCK_BOOTTIME. Last two clocks are monotonous, but the
start points for them are not defined and are different for each
system. When a container is migrated from one node to another, all
clocks have to be restored into consistent states; in other words, they
have to continue running from the same points where they have been
dumped.

The main idea of this patch set is adding per-namespace offsets for
system clocks. When a process in a non-root time namespace requests
time of a clock, a namespace offset is added to the current value of
this clock and the sum is returned.

All offsets are placed on a separate page, this allows us to map it as
part of VVAR into user processes and use offsets from VDSO calls.

Now offsets are implemented for CLOCK_MONOTONIC and CLOCK_BOOTTIME
clocks.

v7..v8 Changes:
* Fix compile-time errors:
- on architectures without the support of time namespaces.
- when CONFIG_POSIX_TIMERS isn't set.
* Added checks in selftests for CONFIG_POSIX_TIMERS.
* Inline do_hres and do_coarse.
(And added Tested-by Vincenzo - thanks!)
* Make TIME_NS depends on GENERIC_VDSO_TIME_NS and set it per-arch.

[v1..v7 Changelogs is at the very bottom here]

Our performance measurements show that the price of VDSO's clock_gettime()
in a child time namespace is about 8% with a hot CPU cache and about 90%
with a cold CPU cache. There is no performance regression for host
processes outside time namespace on those tests.

We wrote two small benchmarks. The first one gettime_perf.c calls
clock_gettime() in a loop for 3 seconds. It shows us performance with
a hot CPU cache (more clock_gettime() cycles - the better):

The first table shows performance of clock_gettime() in the root time
namespace.

| before | TIME_NS=n | TIMENS=y
-------------------------------------------
| 150363883 | 167076184 | 164979177
| 150616056 | 167348942 | 165202727
| 150679279 | 167235485 | 165230267
| 150622312 | 167078735 | 165284077
| 150706992 | 167301837 | 165372663
| 150563707 | 167207900 | 165395728
-------------------------------------------
avg | 150592038 | 167208180 | 165244106
diff % | 100 | 111 | 109.7
-------------------------------------------
stdev % | 0.08 | 0.07 | 0.1

We can see the 11% performance improvement when CONFIG_TIME_NS is
disabled. This is achieved by adding the unlikely hint into
vdso_read_begin() and inlining do_hres() and do_coarse().

When CONFIG_TIME_NS is enabled, there is one more clobbered register in
the __vdso_clock_gettime function. And this fact explains the performance
difference between the two right columns.

The second table shows the performance of clock_gettime in a non-root
time namespace.

| before | host | inside timens
----------------------------------------------
| 150363883 | 164979177 | 138133479
| 150616056 | 165202727 | 139047394
| 150679279 | 165230267 | 139284611
| 150622312 | 165284077 | 139263753
| 150706992 | 165372663 | 139175419
| 150563707 | 165395728 | 139334291
----------------------------------------------
avg | 150592038 | 165244106 | 139039824
diff % | 100 | 109.7 | 92.3
----------------------------------------------
stdev % | 0.08 | 0.1 | 0.3

In a sub-namespace, the performance hit is 7-8%. The bigger difference
between root and non-root namespaces can be explained by the fact that
do_{hres,coarse}_timens are not inlined. Inlining these functions
improves performance in a sub-namespace, but there will be more
clobbered registers in __vdso_clock_gettime what will decrease the
performance in the root namespace.

The gettime_perf_cold test does 10K iterations. In each iteration, it
drops cpu caches for vdso pages, clflush() is used for this, then it runs
rdtsc(); clock_gettime; rdtsc(); and prints the number of tsc cycles.

Cold CPU cache (lesser tsc per cycle - the better):

| before | CONFIG_TIME_NS=n | host | inside timens
--------------------------------------------------------------
tsc | 476 | 480 | 487 | 531
stdev(tsc) | 0.6 | 1.3 | 4.3 | 5.7
diff (%) | 100 | 100.9 | 102 | 112

vdsotest results: https://gist.github.com/avagin/f290afb8b721ae0522a561d585f34de0

The numbers gathered on Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz.

Cc: Adrian Reber <[email protected]>
Cc: Andrei Vagin <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Cyrill Gorcunov <[email protected]>
Cc: Dmitry Safonov <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Jeff Dike <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Pavel Emelyanov <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vincenzo Frascino <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]

v8 on github (if someone prefers `git pull` to `git am`):
https://github.com/0x7f454c46/linux/tree/timens-v8

v7: https://lkml.kernel.org/r/[email protected]
v6: https://lkml.kernel.org/r/[email protected]
v5: https://lkml.kernel.org/r/[email protected]
v4: https://lkml.kernel.org/r/[email protected]
v3: https://lkml.kernel.org/r/[email protected]
v2: https://lore.kernel.org/lkml/[email protected]/
RFC: https://lkml.kernel.org/r/[email protected]/

v6..v7 Changes:
* Based on Andy & Thomas suggestions and the patches that Thomas kindely
sent, reworked from two VDSO code images into trick with odd seq
number for timens page that goes on the place of vvar page inside ns.
* Moved kernel/time_namespace.c => kernel/time/namespace.c
* Fixed bpf 5sec example
* Added selftests outputs
* By Thomas's suggestion simplified overflow check as ktime_sub(tim, offset)
* Other Thomas's review notes: stylistic, simplifications and
clearifications (Thanks!)
* Split VDSO patches on generic/x86 parts
* Fixed kernel-doc warnings
* Added checks in selftests for capabilities
* Fixed bisectability issues

v5..v6 Changes:
* Used current_is_single_threaded() instead of thread_group_empty()
(Thanks for the review, Andy).
* Changed errno code when there are threads on timens joining to
something more grepabble (EUSERS).
* posix_get_timespec() should have been posix_get_monotonic_timespec()
(Thanks, Thomas)
* timens_add_monotonic() & timens_add_boottime() were relocated to
the patch that introduces (struct timens_offsets) (Thomas)
* Avoid breaking alarmtimer for ALARM_REALTIME (Thanks, Thomas)
* Nested namespace inherits father's offsets now
(Andrei while working on CRIU side for time namespace)
* A minor conflict with commit dbc1625fc9de ("hrtimer: Consolidate
hrtimer_init() + hrtimer_init_sleeper() calls") in linux-next
[Sending against next-20190814]

v4..v5 Changes:
* Rebased over generic vdso (already in master)
* Addressing review comments by Thomas Gleixner
(thanks much for your time and patience):
- Dropping `timens` prefix from subjects (it's not a subsystem)
- Keeping commit messages in a neutral technical form
- Splitting unreasonably large patches
- Document code with missing comments
- Dropped dead code that's not compiled with !CONFIG_TIME_NS
* Updated performance results [here, at the bottom]
* Split vdso jump tables patch
* Allow unshare() with many threads: it's safe until fork()/clone(),
where we check for CLONE_THREADS
* Add missed check in setns() for CLONE_VM | CLONE_THREADS
* Fixed compilation with !CONFIG_UTS_NS
* Add a plan in selftests (prevents new warning "Planned tests != run tests")
* Set jump table section address & size to (-1UL) just in case if there
is no such section while running vdso2c (and WARN() on boot in such
case)

v3..v4 Changes:

* CLOCKE_NEWTIME is unshare()-only flag now (CLON_PIDFD took previous value)
* Addressing Jann Horn's feedback - we don't allow CLONE_THREAD or
CLONE_VM together with CLONE_NEWTIME (thanks for spotting!)
* Addressing issues found by Thomas - removed unmaintainable CLOCK_TIMENS
and introduced another call back into k_clock to get ktime instead
of getting timespec and converting it (Patch 03)
* Renaming timens_offsets members to omit _offset postfix
(thanks Cyrill for the suggestion)
* Suggestions, renaming and making code more maintainable from Thomas's
feedback (thanks much!)
* Fixing out-of-bounds and other issues in procfs file (kudos Jann Horn)
* vdso_fault() can be called on a remote task by /proc/$pid/mem or
process_vm_readv() - addressed by adding a slow-path with searching
for owner's namespace (thanks for spotting this unobvious issue, Jann)
* Other nits by Jann Horn

v2..v3: Major changes:

* Simplify two VDSO images by using static_branch() in vclock_gettime()
Removes unwanted conflicts with generic VDSO movement patches and
simplifies things by dropping too invasive linker magic.
As an alternative to static_branch() we tested an attempt to introduce
home-made dynamic patching called retcalls:
https://github.com/0x7f454c46/linux/commit/4cc0180f6d65
Considering some theoretical problems with toolchains, we decided to go
with long well-tested nop-patching in static_branch(). Though, it was
needed to provide backend for relative code.

* address Thomas' comments.
* add sanity checks for offsets:
- the current clock time in a namespace has to be in [0, KTIME_MAX / 2).
KTIME_MAX is divided by two here to be sure that the KTIME_MAX limit
is still unreachable.
Link: https://lkml.org/lkml/2018/9/19/950
Link: https://lkml.org/lkml/2019/2/5/867

v1..v2: There are two major changes:

* Two versions of the VDSO library to avoid a performance penalty for
host tasks outside time namespace (as suggested by Andy and Thomas).

As it has been discussed on timens RFC, adding a new conditional branch
`if (inside_time_ns)` on VDSO for all processes is undesirable.
It will add a penalty for everybody as branch predictor may mispredict
the jump. Also there are instruction cache lines wasted on cmp/jmp.

Those effects of introducing time namespace are very much unwanted
having in mind how much work have been spent on micro-optimisation
VDSO code.

Addressing those problems, there are two versions of VDSO's .so:
for host tasks (without any penalty) and for processes inside of time
namespace with clk_to_ns() that subtracts offsets from host's time.


* Allow to set clock offsets for a namespace only before any processes
appear in it.

Now a time namespace looks similar to a pid namespace in a way how it is
created: unshare(CLONE_NEWTIME) system call creates a new time namespace,
but doesn't set it to the current process. Then all children of
the process will be born in the new time namespace, or a process can
use the setns() system call to join a namespace.

This scheme allows to create a new time namespaces, set clock offsets
and then populate the namespace with processes.

Andrei Vagin (23):
lib/vdso: Add unlikely() hint into vdso_read_begin()
lib/vdso: make do_hres and do_coarse as __always_inline
ns: Introduce Time Namespace
time: Add timens_offsets to be used for tasks in timens
posix-clocks: Rename the clock_get() callback to clock_get_timespec()
posix-clocks: Rename .clock_get_timespec() callbacks accordingly
alarmtimer: Rename gettime() callback to get_ktime()
alarmtimer: Provide get_timespec() callback
posix-clocks: Introduce clock_get_ktime() callback
posix-timers: Use clock_get_ktime() in common_timer_get()
posix-clocks: Wire up clock_gettime() with timens offsets
kernel: Add do_timens_ktime_to_host() helper
timerfd: Make timerfd_settime() time namespace aware
posix-timers: Make timer_settime() time namespace aware
alarmtimer: Make nanosleep time namespace aware
hrtimers: Prepare hrtimer_nanosleep() for time namespaces
posix-timers: Make clock_nanosleep() time namespace aware
fs/proc: Introduce /proc/pid/timens_offsets
selftests/timens: Add a test for timerfd
selftests/timens: Add a test for clock_nanosleep()
selftests/timens: Add timer offsets test
selftests/timens: Add a simple perf test for clock_gettime()
selftests/timens: Check for right timens offsets after fork and exec

Dmitry Safonov (10):
fs/proc: Respect boottime inside time namespace for /proc/uptime
x86/vdso: Restrict splitting VVAR VMA
x86/vdso: Provide vdso_data offset on vvar_page
x86/vdso: Add timens page
time: Allocate per-timens vvar page
x86/vdso: Handle faults on timens page
x86/vdso: On timens page fault prefault also VVAR page
x86/vdso: Zap vvar pages on switch a time namspace
selftests/timens: Add Time Namespace test for supported clocks
selftests/timens: Add procfs selftest

Thomas Gleixner (1):
lib/vdso: Prepare for time namespace support

MAINTAINERS | 2 +
arch/x86/Kconfig | 1 +
arch/x86/entry/vdso/vdso-layout.lds.S | 13 +-
arch/x86/entry/vdso/vdso2c.c | 3 +
arch/x86/entry/vdso/vma.c | 119 ++++-
arch/x86/include/asm/vdso.h | 1 +
arch/x86/include/asm/vdso/gettimeofday.h | 9 +
arch/x86/include/asm/vvar.h | 13 +-
arch/x86/kernel/vmlinux.lds.S | 4 +-
fs/proc/base.c | 95 ++++
fs/proc/namespaces.c | 4 +
fs/proc/uptime.c | 3 +
fs/timerfd.c | 3 +
include/linux/hrtimer.h | 2 +-
include/linux/nsproxy.h | 2 +
include/linux/proc_ns.h | 3 +
include/linux/time.h | 6 +
include/linux/time_namespace.h | 128 +++++
include/linux/user_namespace.h | 1 +
include/uapi/linux/sched.h | 6 +
include/vdso/datapage.h | 19 +-
include/vdso/helpers.h | 2 +-
init/Kconfig | 8 +
kernel/fork.c | 16 +-
kernel/nsproxy.c | 41 +-
kernel/time/Makefile | 1 +
kernel/time/alarmtimer.c | 73 ++-
kernel/time/hrtimer.c | 8 +-
kernel/time/namespace.c | 466 ++++++++++++++++++
kernel/time/posix-clock.c | 8 +-
kernel/time/posix-cpu-timers.c | 32 +-
kernel/time/posix-stubs.c | 15 +-
kernel/time/posix-timers.c | 88 +++-
kernel/time/posix-timers.h | 7 +-
lib/vdso/Kconfig | 6 +
lib/vdso/gettimeofday.c | 164 +++++-
mm/mmap.c | 2 +
tools/perf/examples/bpf/5sec.c | 6 +-
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/timens/.gitignore | 8 +
tools/testing/selftests/timens/Makefile | 7 +
.../selftests/timens/clock_nanosleep.c | 150 ++++++
tools/testing/selftests/timens/config | 1 +
tools/testing/selftests/timens/exec.c | 94 ++++
tools/testing/selftests/timens/gettime_perf.c | 95 ++++
tools/testing/selftests/timens/log.h | 26 +
tools/testing/selftests/timens/procfs.c | 144 ++++++
tools/testing/selftests/timens/timens.c | 190 +++++++
tools/testing/selftests/timens/timens.h | 100 ++++
tools/testing/selftests/timens/timer.c | 123 +++++
tools/testing/selftests/timens/timerfd.c | 129 +++++
51 files changed, 2337 insertions(+), 111 deletions(-)
create mode 100644 include/linux/time_namespace.h
create mode 100644 kernel/time/namespace.c
create mode 100644 tools/testing/selftests/timens/.gitignore
create mode 100644 tools/testing/selftests/timens/Makefile
create mode 100644 tools/testing/selftests/timens/clock_nanosleep.c
create mode 100644 tools/testing/selftests/timens/config
create mode 100644 tools/testing/selftests/timens/exec.c
create mode 100644 tools/testing/selftests/timens/gettime_perf.c
create mode 100644 tools/testing/selftests/timens/log.h
create mode 100644 tools/testing/selftests/timens/procfs.c
create mode 100644 tools/testing/selftests/timens/timens.c
create mode 100644 tools/testing/selftests/timens/timens.h
create mode 100644 tools/testing/selftests/timens/timer.c
create mode 100644 tools/testing/selftests/timens/timerfd.c

--
2.24.0


2019-11-12 01:33:09

by Dmitry Safonov

[permalink] [raw]
Subject: [PATCHv8 25/34] x86/vdso: On timens page fault prefault also VVAR page

As timens page has offsets to data on VVAR page VVAR is going
to be accessed shortly. Set it up with timens in one page fault
as optimization.

Suggested-by: Thomas Gleixner <[email protected]>
Co-developed-by: Andrei Vagin <[email protected]>
Signed-off-by: Andrei Vagin <[email protected]>
Signed-off-by: Dmitry Safonov <[email protected]>
---
arch/x86/entry/vdso/vma.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index f6e13ab29d94..d6cb8a16f368 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -169,8 +169,23 @@ static vm_fault_t vvar_fault(const struct vm_special_mapping *sm,
* offset.
* See also the comment near timens_setup_vdso_data().
*/
- if (timens_page)
+ if (timens_page) {
+ unsigned long addr;
+ vm_fault_t err;
+
+ /*
+ * Optimization: inside time namespace pre-fault
+ * VVAR page too. As on timens page there are only
+ * offsets for clocks on VVAR, it'll be faulted
+ * shortly by VDSO code.
+ */
+ addr = vmf->address + (image->sym_timens_page - sym_offset);
+ err = vmf_insert_pfn(vma, addr, pfn);
+ if (unlikely(err & VM_FAULT_ERROR))
+ return err;
+
pfn = page_to_pfn(timens_page);
+ }

return vmf_insert_pfn(vma, vmf->address, pfn);
} else if (sym_offset == image->sym_pvclock_page) {
--
2.24.0

2019-11-12 01:34:05

by Dmitry Safonov

[permalink] [raw]
Subject: [PATCHv8 05/34] posix-clocks: Rename the clock_get() callback to clock_get_timespec()

From: Andrei Vagin <[email protected]>

The upcoming support for time namespaces requires to have access to:
- The time in a task's time namespace for sys_clock_gettime()
- The time in the root name space for common_timer_get()

That adds a valid reason to finally implement a separate callback which
returns the time in ktime_t format, rather than in (struct timespec).

Rename the clock_get() callback to clock_get_timespec() as a preparation
for introducing clock_get_ktime().

Suggested-by: Thomas Gleixner <[email protected]>
Signed-off-by: Andrei Vagin <[email protected]>
Co-developed-by: Dmitry Safonov <[email protected]>
Signed-off-by: Dmitry Safonov <[email protected]>
---
kernel/time/alarmtimer.c | 4 ++--
kernel/time/posix-clock.c | 8 ++++----
kernel/time/posix-cpu-timers.c | 32 ++++++++++++++++----------------
kernel/time/posix-timers.c | 22 +++++++++++-----------
kernel/time/posix-timers.h | 4 ++--
5 files changed, 35 insertions(+), 35 deletions(-)

diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c
index 451f9d05ccfe..8523df726fee 100644
--- a/kernel/time/alarmtimer.c
+++ b/kernel/time/alarmtimer.c
@@ -657,7 +657,7 @@ static int alarm_clock_getres(const clockid_t which_clock, struct timespec64 *tp
}

/**
- * alarm_clock_get - posix clock_get interface
+ * alarm_clock_get - posix clock_get_timespec interface
* @which_clock: clockid
* @tp: timespec to fill.
*
@@ -837,7 +837,7 @@ static int alarm_timer_nsleep(const clockid_t which_clock, int flags,

const struct k_clock alarm_clock = {
.clock_getres = alarm_clock_getres,
- .clock_get = alarm_clock_get,
+ .clock_get_timespec = alarm_clock_get,
.timer_create = alarm_timer_create,
.timer_set = common_timer_set,
.timer_del = common_timer_del,
diff --git a/kernel/time/posix-clock.c b/kernel/time/posix-clock.c
index ec960bb939fd..c8f9c9b1cd82 100644
--- a/kernel/time/posix-clock.c
+++ b/kernel/time/posix-clock.c
@@ -315,8 +315,8 @@ static int pc_clock_settime(clockid_t id, const struct timespec64 *ts)
}

const struct k_clock clock_posix_dynamic = {
- .clock_getres = pc_clock_getres,
- .clock_set = pc_clock_settime,
- .clock_get = pc_clock_gettime,
- .clock_adj = pc_clock_adjtime,
+ .clock_getres = pc_clock_getres,
+ .clock_set = pc_clock_settime,
+ .clock_get_timespec = pc_clock_gettime,
+ .clock_adj = pc_clock_adjtime,
};
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 42d512fcfda2..8ff6da77a01f 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -1391,26 +1391,26 @@ static int thread_cpu_timer_create(struct k_itimer *timer)
}

const struct k_clock clock_posix_cpu = {
- .clock_getres = posix_cpu_clock_getres,
- .clock_set = posix_cpu_clock_set,
- .clock_get = posix_cpu_clock_get,
- .timer_create = posix_cpu_timer_create,
- .nsleep = posix_cpu_nsleep,
- .timer_set = posix_cpu_timer_set,
- .timer_del = posix_cpu_timer_del,
- .timer_get = posix_cpu_timer_get,
- .timer_rearm = posix_cpu_timer_rearm,
+ .clock_getres = posix_cpu_clock_getres,
+ .clock_set = posix_cpu_clock_set,
+ .clock_get_timespec = posix_cpu_clock_get,
+ .timer_create = posix_cpu_timer_create,
+ .nsleep = posix_cpu_nsleep,
+ .timer_set = posix_cpu_timer_set,
+ .timer_del = posix_cpu_timer_del,
+ .timer_get = posix_cpu_timer_get,
+ .timer_rearm = posix_cpu_timer_rearm,
};

const struct k_clock clock_process = {
- .clock_getres = process_cpu_clock_getres,
- .clock_get = process_cpu_clock_get,
- .timer_create = process_cpu_timer_create,
- .nsleep = process_cpu_nsleep,
+ .clock_getres = process_cpu_clock_getres,
+ .clock_get_timespec = process_cpu_clock_get,
+ .timer_create = process_cpu_timer_create,
+ .nsleep = process_cpu_nsleep,
};

const struct k_clock clock_thread = {
- .clock_getres = thread_cpu_clock_getres,
- .clock_get = thread_cpu_clock_get,
- .timer_create = thread_cpu_timer_create,
+ .clock_getres = thread_cpu_clock_getres,
+ .clock_get_timespec = thread_cpu_clock_get,
+ .timer_create = thread_cpu_timer_create,
};
diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
index 0ec5b7a1d769..44d4f9cb782d 100644
--- a/kernel/time/posix-timers.c
+++ b/kernel/time/posix-timers.c
@@ -667,7 +667,7 @@ void common_timer_get(struct k_itimer *timr, struct itimerspec64 *cur_setting)
* The timespec64 based conversion is suboptimal, but it's not
* worth to implement yet another callback.
*/
- kc->clock_get(timr->it_clock, &ts64);
+ kc->clock_get_timespec(timr->it_clock, &ts64);
now = timespec64_to_ktime(ts64);

/*
@@ -781,7 +781,7 @@ static void common_hrtimer_arm(struct k_itimer *timr, ktime_t expires,
* Posix magic: Relative CLOCK_REALTIME timers are not affected by
* clock modifications, so they become CLOCK_MONOTONIC based under the
* hood. See hrtimer_init(). Update timr->kclock, so the generic
- * functions which use timr->kclock->clock_get() work.
+ * functions which use timr->kclock->clock_get_timespec() work.
*
* Note: it_clock stays unmodified, because the next timer_set() might
* use ABSTIME, so it needs to switch back.
@@ -1067,7 +1067,7 @@ SYSCALL_DEFINE2(clock_gettime, const clockid_t, which_clock,
if (!kc)
return -EINVAL;

- error = kc->clock_get(which_clock, &kernel_tp);
+ error = kc->clock_get_timespec(which_clock, &kernel_tp);

if (!error && put_timespec64(&kernel_tp, tp))
error = -EFAULT;
@@ -1149,7 +1149,7 @@ SYSCALL_DEFINE2(clock_gettime32, clockid_t, which_clock,
if (!kc)
return -EINVAL;

- err = kc->clock_get(which_clock, &ts);
+ err = kc->clock_get_timespec(which_clock, &ts);

if (!err && put_old_timespec32(&ts, tp))
err = -EFAULT;
@@ -1261,7 +1261,7 @@ SYSCALL_DEFINE4(clock_nanosleep_time32, clockid_t, which_clock, int, flags,

static const struct k_clock clock_realtime = {
.clock_getres = posix_get_hrtimer_res,
- .clock_get = posix_clock_realtime_get,
+ .clock_get_timespec = posix_clock_realtime_get,
.clock_set = posix_clock_realtime_set,
.clock_adj = posix_clock_realtime_adj,
.nsleep = common_nsleep,
@@ -1279,7 +1279,7 @@ static const struct k_clock clock_realtime = {

static const struct k_clock clock_monotonic = {
.clock_getres = posix_get_hrtimer_res,
- .clock_get = posix_ktime_get_ts,
+ .clock_get_timespec = posix_ktime_get_ts,
.nsleep = common_nsleep,
.timer_create = common_timer_create,
.timer_set = common_timer_set,
@@ -1295,22 +1295,22 @@ static const struct k_clock clock_monotonic = {

static const struct k_clock clock_monotonic_raw = {
.clock_getres = posix_get_hrtimer_res,
- .clock_get = posix_get_monotonic_raw,
+ .clock_get_timespec = posix_get_monotonic_raw,
};

static const struct k_clock clock_realtime_coarse = {
.clock_getres = posix_get_coarse_res,
- .clock_get = posix_get_realtime_coarse,
+ .clock_get_timespec = posix_get_realtime_coarse,
};

static const struct k_clock clock_monotonic_coarse = {
.clock_getres = posix_get_coarse_res,
- .clock_get = posix_get_monotonic_coarse,
+ .clock_get_timespec = posix_get_monotonic_coarse,
};

static const struct k_clock clock_tai = {
.clock_getres = posix_get_hrtimer_res,
- .clock_get = posix_get_tai,
+ .clock_get_timespec = posix_get_tai,
.nsleep = common_nsleep,
.timer_create = common_timer_create,
.timer_set = common_timer_set,
@@ -1326,7 +1326,7 @@ static const struct k_clock clock_tai = {

static const struct k_clock clock_boottime = {
.clock_getres = posix_get_hrtimer_res,
- .clock_get = posix_get_boottime,
+ .clock_get_timespec = posix_get_boottime,
.nsleep = common_nsleep,
.timer_create = common_timer_create,
.timer_set = common_timer_set,
diff --git a/kernel/time/posix-timers.h b/kernel/time/posix-timers.h
index 897c29e162b9..070611b2c253 100644
--- a/kernel/time/posix-timers.h
+++ b/kernel/time/posix-timers.h
@@ -6,8 +6,8 @@ struct k_clock {
struct timespec64 *tp);
int (*clock_set)(const clockid_t which_clock,
const struct timespec64 *tp);
- int (*clock_get)(const clockid_t which_clock,
- struct timespec64 *tp);
+ int (*clock_get_timespec)(const clockid_t which_clock,
+ struct timespec64 *tp);
int (*clock_adj)(const clockid_t which_clock, struct __kernel_timex *tx);
int (*timer_create)(struct k_itimer *timer);
int (*nsleep)(const clockid_t which_clock, int flags,
--
2.24.0

2019-11-12 01:34:21

by Dmitry Safonov

[permalink] [raw]
Subject: [PATCHv8 01/34] lib/vdso: Add unlikely() hint into vdso_read_begin()

From: Andrei Vagin <[email protected]>

Place the branch with no concurrent write before contended case.

Performance numbers for Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz
(more clock_gettime() cycles - the better):
| before | after
-----------------------------------
| 150252214 | 153242367
| 150301112 | 153324800
| 150392773 | 153125401
| 150373957 | 153399355
| 150303157 | 153489417
| 150365237 | 153494270
-----------------------------------
avg | 150331408 | 153345935
diff % | 2 | 0
-----------------------------------
stdev % | 0.3 | 0.1

Signed-off-by: Andrei Vagin <[email protected]>
Co-developed-by: Dmitry Safonov <[email protected]>
Reviewed-by: Vincenzo Frascino <[email protected]>
Tested-by: Vincenzo Frascino <[email protected]>
Signed-off-by: Dmitry Safonov <[email protected]>
---
include/vdso/helpers.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/vdso/helpers.h b/include/vdso/helpers.h
index 01641dbb68ef..9a2af9fca45e 100644
--- a/include/vdso/helpers.h
+++ b/include/vdso/helpers.h
@@ -10,7 +10,7 @@ static __always_inline u32 vdso_read_begin(const struct vdso_data *vd)
{
u32 seq;

- while ((seq = READ_ONCE(vd->seq)) & 1)
+ while (unlikely((seq = READ_ONCE(vd->seq)) & 1))
cpu_relax();

smp_rmb();
--
2.24.0

2019-11-21 18:07:34

by Andrei Vagin

[permalink] [raw]
Subject: Re: [PATCHv8 00/34] kernel: Introduce Time Namespace

Hi Thomas,

What is your plan on this series? We know you are probably busy with
the next merge window. We just want to check that this is still in your
TODO list.

On Tue, Nov 12, 2019 at 01:26:49AM +0000, Dmitry Safonov wrote:
>
> v7..v8 Changes:
> * Fix compile-time errors:
> - on architectures without the support of time namespaces.
> - when CONFIG_POSIX_TIMERS isn't set.
> * Added checks in selftests for CONFIG_POSIX_TIMERS.
> * Inline do_hres and do_coarse.
> (And added Tested-by Vincenzo - thanks!)
> * Make TIME_NS depends on GENERIC_VDSO_TIME_NS and set it per-arch.
>
> [v1..v7 Changelogs is at the very bottom here]
>
> Our performance measurements show that the price of VDSO's clock_gettime()
> in a child time namespace is about 8% with a hot CPU cache and about 90%

Here is a typo. The price of VDSO's clock_gettime() in a child time
namespace is about 12% with a cold CPU cache. The table with
measurements for a cold CPU cache contains correct data.

> with a cold CPU cache. There is no performance regression for host
> processes outside time namespace on those tests.
>

....

>
> Cold CPU cache (lesser tsc per cycle - the better):
>
> | before | CONFIG_TIME_NS=n | host | inside timens
> --------------------------------------------------------------
> tsc | 476 | 480 | 487 | 531
> stdev(tsc) | 0.6 | 1.3 | 4.3 | 5.7
> diff (%) | 100 | 100.9 | 102 | 112
>

2019-12-11 20:40:57

by Dmitry Safonov

[permalink] [raw]
Subject: Re: [PATCHv8 00/34] kernel: Introduce Time Namespace

Gentle ping, in case you have time to look at this.

On Thu, 21 Nov 2019 at 18:05, Andrei Vagin <[email protected]> wrote:
>
> Hi Thomas,
>
> What is your plan on this series? We know you are probably busy with
> the next merge window. We just want to check that this is still in your
> TODO list.
>
> On Tue, Nov 12, 2019 at 01:26:49AM +0000, Dmitry Safonov wrote:
> >
> > v7..v8 Changes:
> > * Fix compile-time errors:
> > - on architectures without the support of time namespaces.
> > - when CONFIG_POSIX_TIMERS isn't set.
> > * Added checks in selftests for CONFIG_POSIX_TIMERS.
> > * Inline do_hres and do_coarse.
> > (And added Tested-by Vincenzo - thanks!)
> > * Make TIME_NS depends on GENERIC_VDSO_TIME_NS and set it per-arch.
> >
> > [v1..v7 Changelogs is at the very bottom here]
> >
> > Our performance measurements show that the price of VDSO's clock_gettime()
> > in a child time namespace is about 8% with a hot CPU cache and about 90%
>
> Here is a typo. The price of VDSO's clock_gettime() in a child time
> namespace is about 12% with a cold CPU cache. The table with
> measurements for a cold CPU cache contains correct data.
>
> > with a cold CPU cache. There is no performance regression for host
> > processes outside time namespace on those tests.
> >
>
> ....
>
> >
> > Cold CPU cache (lesser tsc per cycle - the better):
> >
> > | before | CONFIG_TIME_NS=n | host | inside timens
> > --------------------------------------------------------------
> > tsc | 476 | 480 | 487 | 531
> > stdev(tsc) | 0.6 | 1.3 | 4.3 | 5.7
> > diff (%) | 100 | 100.9 | 102 | 112
> >

Thanks,
Dmitry

2020-01-09 21:12:40

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCHv8 00/34] kernel: Introduce Time Namespace

Dmitry Safonov <[email protected]> writes:

> Gentle ping, in case you have time to look at this.

I'm looking at it and so far I'm quite happy.

Andy, Vincenco any opinions?

Thanks,

tglx

2020-01-10 09:51:26

by Vincenzo Frascino

[permalink] [raw]
Subject: Re: [PATCHv8 00/34] kernel: Introduce Time Namespace

Hi Thomas,

On 1/9/20 9:09 PM, Thomas Gleixner wrote:
> Dmitry Safonov <[email protected]> writes:
>
>> Gentle ping, in case you have time to look at this.
>
> I'm looking at it and so far I'm quite happy.
>
> Andy, Vincenco any opinions?
>

I started looking at them after the holidays, in general I am happy with what I
have seen till now.

I would like to complete some testing especially on the platforms that are not
touched by this patchset to make sure that there are no side effects on the
unified vDSOs and then I think I am ok with the series.

> Thanks,
>
> tglx
>

--
Regards,
Vincenzo

2020-01-13 19:12:38

by tip-bot2 for Jacob Pan

[permalink] [raw]
Subject: [tip: timers/core] lib/vdso: Add unlikely() hint into vdso_read_begin()

The following commit has been merged into the timers/core branch of tip:

Commit-ID: 79472606d6fba66c20299f4121a657ce73efc302
Gitweb: https://git.kernel.org/tip/79472606d6fba66c20299f4121a657ce73efc302
Author: Andrei Vagin <[email protected]>
AuthorDate: Tue, 12 Nov 2019 01:26:50
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Mon, 13 Jan 2020 08:10:46 +01:00

lib/vdso: Add unlikely() hint into vdso_read_begin()

Place the branch with no concurrent write before the contended case.

Performance numbers for Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz
(more clock_gettime() cycles - the better):
| before | after
-----------------------------------
| 150252214 | 153242367
| 150301112 | 153324800
| 150392773 | 153125401
| 150373957 | 153399355
| 150303157 | 153489417
| 150365237 | 153494270
-----------------------------------
avg | 150331408 | 153345935
diff % | 2 | 0
-----------------------------------
stdev % | 0.3 | 0.1

Co-developed-by: Dmitry Safonov <[email protected]>
Signed-off-by: Andrei Vagin <[email protected]>
Signed-off-by: Dmitry Safonov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Vincenzo Frascino <[email protected]>
Reviewed-by: Vincenzo Frascino <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

---
include/vdso/helpers.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/vdso/helpers.h b/include/vdso/helpers.h
index 01641db..9a2af9f 100644
--- a/include/vdso/helpers.h
+++ b/include/vdso/helpers.h
@@ -10,7 +10,7 @@ static __always_inline u32 vdso_read_begin(const struct vdso_data *vd)
{
u32 seq;

- while ((seq = READ_ONCE(vd->seq)) & 1)
+ while (unlikely((seq = READ_ONCE(vd->seq)) & 1))
cpu_relax();

smp_rmb();

2020-01-13 19:13:39

by tip-bot2 for Jacob Pan

[permalink] [raw]
Subject: [tip: timers/core] posix-clocks: Rename the clock_get() callback to clock_get_timespec()

The following commit has been merged into the timers/core branch of tip:

Commit-ID: 9552f6a3e9fc221e8fdcffbdf7b56df0483daa59
Gitweb: https://git.kernel.org/tip/9552f6a3e9fc221e8fdcffbdf7b56df0483daa59
Author: Andrei Vagin <[email protected]>
AuthorDate: Tue, 12 Nov 2019 01:26:54
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Mon, 13 Jan 2020 08:10:48 +01:00

posix-clocks: Rename the clock_get() callback to clock_get_timespec()

The upcoming support for time namespaces requires to have access to:

- The time in a task's time namespace for sys_clock_gettime()
- The time in the root name space for common_timer_get()

That adds a valid reason to finally implement a separate callback which
returns the time in ktime_t format, rather than in (struct timespec).

Rename the clock_get() callback to clock_get_timespec() as a preparation
for introducing clock_get_ktime().

Suggested-by: Thomas Gleixner <[email protected]>
Co-developed-by: Dmitry Safonov <[email protected]>
Signed-off-by: Andrei Vagin <[email protected]>
Signed-off-by: Dmitry Safonov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

---
kernel/time/alarmtimer.c | 4 ++--
kernel/time/posix-clock.c | 8 ++++----
kernel/time/posix-cpu-timers.c | 32 ++++++++++++++++----------------
kernel/time/posix-timers.c | 22 +++++++++++-----------
kernel/time/posix-timers.h | 4 ++--
5 files changed, 35 insertions(+), 35 deletions(-)

diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c
index 451f9d0..8523df7 100644
--- a/kernel/time/alarmtimer.c
+++ b/kernel/time/alarmtimer.c
@@ -657,7 +657,7 @@ static int alarm_clock_getres(const clockid_t which_clock, struct timespec64 *tp
}

/**
- * alarm_clock_get - posix clock_get interface
+ * alarm_clock_get - posix clock_get_timespec interface
* @which_clock: clockid
* @tp: timespec to fill.
*
@@ -837,7 +837,7 @@ static int alarm_timer_nsleep(const clockid_t which_clock, int flags,

const struct k_clock alarm_clock = {
.clock_getres = alarm_clock_getres,
- .clock_get = alarm_clock_get,
+ .clock_get_timespec = alarm_clock_get,
.timer_create = alarm_timer_create,
.timer_set = common_timer_set,
.timer_del = common_timer_del,
diff --git a/kernel/time/posix-clock.c b/kernel/time/posix-clock.c
index 200fb2d..77c0c23 100644
--- a/kernel/time/posix-clock.c
+++ b/kernel/time/posix-clock.c
@@ -310,8 +310,8 @@ out:
}

const struct k_clock clock_posix_dynamic = {
- .clock_getres = pc_clock_getres,
- .clock_set = pc_clock_settime,
- .clock_get = pc_clock_gettime,
- .clock_adj = pc_clock_adjtime,
+ .clock_getres = pc_clock_getres,
+ .clock_set = pc_clock_settime,
+ .clock_get_timespec = pc_clock_gettime,
+ .clock_adj = pc_clock_adjtime,
};
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 42d512f..8ff6da7 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -1391,26 +1391,26 @@ static int thread_cpu_timer_create(struct k_itimer *timer)
}

const struct k_clock clock_posix_cpu = {
- .clock_getres = posix_cpu_clock_getres,
- .clock_set = posix_cpu_clock_set,
- .clock_get = posix_cpu_clock_get,
- .timer_create = posix_cpu_timer_create,
- .nsleep = posix_cpu_nsleep,
- .timer_set = posix_cpu_timer_set,
- .timer_del = posix_cpu_timer_del,
- .timer_get = posix_cpu_timer_get,
- .timer_rearm = posix_cpu_timer_rearm,
+ .clock_getres = posix_cpu_clock_getres,
+ .clock_set = posix_cpu_clock_set,
+ .clock_get_timespec = posix_cpu_clock_get,
+ .timer_create = posix_cpu_timer_create,
+ .nsleep = posix_cpu_nsleep,
+ .timer_set = posix_cpu_timer_set,
+ .timer_del = posix_cpu_timer_del,
+ .timer_get = posix_cpu_timer_get,
+ .timer_rearm = posix_cpu_timer_rearm,
};

const struct k_clock clock_process = {
- .clock_getres = process_cpu_clock_getres,
- .clock_get = process_cpu_clock_get,
- .timer_create = process_cpu_timer_create,
- .nsleep = process_cpu_nsleep,
+ .clock_getres = process_cpu_clock_getres,
+ .clock_get_timespec = process_cpu_clock_get,
+ .timer_create = process_cpu_timer_create,
+ .nsleep = process_cpu_nsleep,
};

const struct k_clock clock_thread = {
- .clock_getres = thread_cpu_clock_getres,
- .clock_get = thread_cpu_clock_get,
- .timer_create = thread_cpu_timer_create,
+ .clock_getres = thread_cpu_clock_getres,
+ .clock_get_timespec = thread_cpu_clock_get,
+ .timer_create = thread_cpu_timer_create,
};
diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
index 0ec5b7a..44d4f9c 100644
--- a/kernel/time/posix-timers.c
+++ b/kernel/time/posix-timers.c
@@ -667,7 +667,7 @@ void common_timer_get(struct k_itimer *timr, struct itimerspec64 *cur_setting)
* The timespec64 based conversion is suboptimal, but it's not
* worth to implement yet another callback.
*/
- kc->clock_get(timr->it_clock, &ts64);
+ kc->clock_get_timespec(timr->it_clock, &ts64);
now = timespec64_to_ktime(ts64);

/*
@@ -781,7 +781,7 @@ static void common_hrtimer_arm(struct k_itimer *timr, ktime_t expires,
* Posix magic: Relative CLOCK_REALTIME timers are not affected by
* clock modifications, so they become CLOCK_MONOTONIC based under the
* hood. See hrtimer_init(). Update timr->kclock, so the generic
- * functions which use timr->kclock->clock_get() work.
+ * functions which use timr->kclock->clock_get_timespec() work.
*
* Note: it_clock stays unmodified, because the next timer_set() might
* use ABSTIME, so it needs to switch back.
@@ -1067,7 +1067,7 @@ SYSCALL_DEFINE2(clock_gettime, const clockid_t, which_clock,
if (!kc)
return -EINVAL;

- error = kc->clock_get(which_clock, &kernel_tp);
+ error = kc->clock_get_timespec(which_clock, &kernel_tp);

if (!error && put_timespec64(&kernel_tp, tp))
error = -EFAULT;
@@ -1149,7 +1149,7 @@ SYSCALL_DEFINE2(clock_gettime32, clockid_t, which_clock,
if (!kc)
return -EINVAL;

- err = kc->clock_get(which_clock, &ts);
+ err = kc->clock_get_timespec(which_clock, &ts);

if (!err && put_old_timespec32(&ts, tp))
err = -EFAULT;
@@ -1261,7 +1261,7 @@ SYSCALL_DEFINE4(clock_nanosleep_time32, clockid_t, which_clock, int, flags,

static const struct k_clock clock_realtime = {
.clock_getres = posix_get_hrtimer_res,
- .clock_get = posix_clock_realtime_get,
+ .clock_get_timespec = posix_clock_realtime_get,
.clock_set = posix_clock_realtime_set,
.clock_adj = posix_clock_realtime_adj,
.nsleep = common_nsleep,
@@ -1279,7 +1279,7 @@ static const struct k_clock clock_realtime = {

static const struct k_clock clock_monotonic = {
.clock_getres = posix_get_hrtimer_res,
- .clock_get = posix_ktime_get_ts,
+ .clock_get_timespec = posix_ktime_get_ts,
.nsleep = common_nsleep,
.timer_create = common_timer_create,
.timer_set = common_timer_set,
@@ -1295,22 +1295,22 @@ static const struct k_clock clock_monotonic = {

static const struct k_clock clock_monotonic_raw = {
.clock_getres = posix_get_hrtimer_res,
- .clock_get = posix_get_monotonic_raw,
+ .clock_get_timespec = posix_get_monotonic_raw,
};

static const struct k_clock clock_realtime_coarse = {
.clock_getres = posix_get_coarse_res,
- .clock_get = posix_get_realtime_coarse,
+ .clock_get_timespec = posix_get_realtime_coarse,
};

static const struct k_clock clock_monotonic_coarse = {
.clock_getres = posix_get_coarse_res,
- .clock_get = posix_get_monotonic_coarse,
+ .clock_get_timespec = posix_get_monotonic_coarse,
};

static const struct k_clock clock_tai = {
.clock_getres = posix_get_hrtimer_res,
- .clock_get = posix_get_tai,
+ .clock_get_timespec = posix_get_tai,
.nsleep = common_nsleep,
.timer_create = common_timer_create,
.timer_set = common_timer_set,
@@ -1326,7 +1326,7 @@ static const struct k_clock clock_tai = {

static const struct k_clock clock_boottime = {
.clock_getres = posix_get_hrtimer_res,
- .clock_get = posix_get_boottime,
+ .clock_get_timespec = posix_get_boottime,
.nsleep = common_nsleep,
.timer_create = common_timer_create,
.timer_set = common_timer_set,
diff --git a/kernel/time/posix-timers.h b/kernel/time/posix-timers.h
index 897c29e..070611b 100644
--- a/kernel/time/posix-timers.h
+++ b/kernel/time/posix-timers.h
@@ -6,8 +6,8 @@ struct k_clock {
struct timespec64 *tp);
int (*clock_set)(const clockid_t which_clock,
const struct timespec64 *tp);
- int (*clock_get)(const clockid_t which_clock,
- struct timespec64 *tp);
+ int (*clock_get_timespec)(const clockid_t which_clock,
+ struct timespec64 *tp);
int (*clock_adj)(const clockid_t which_clock, struct __kernel_timex *tx);
int (*timer_create)(struct k_itimer *timer);
int (*nsleep)(const clockid_t which_clock, int flags,

2020-01-13 19:13:41

by tip-bot2 for Jacob Pan

[permalink] [raw]
Subject: [tip: timers/core] x86/vdso: On timens page fault prefault also VVAR page

The following commit has been merged into the timers/core branch of tip:

Commit-ID: 4970dc4c31da81447c51a375a906616bf6618f7b
Gitweb: https://git.kernel.org/tip/4970dc4c31da81447c51a375a906616bf6618f7b
Author: Dmitry Safonov <[email protected]>
AuthorDate: Tue, 12 Nov 2019 01:27:14
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Mon, 13 Jan 2020 08:10:57 +01:00

x86/vdso: On timens page fault prefault also VVAR page

As timens page has offsets to data on VVAR page VVAR is going
to be accessed shortly. Set it up with timens in one page fault
as optimization.

Suggested-by: Thomas Gleixner <[email protected]>
Co-developed-by: Andrei Vagin <[email protected]>
Signed-off-by: Andrei Vagin <[email protected]>
Signed-off-by: Dmitry Safonov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

---
arch/x86/entry/vdso/vma.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index e5f3361..d2fd8a5 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -170,8 +170,23 @@ static vm_fault_t vvar_fault(const struct vm_special_mapping *sm,
* offset.
* See also the comment near timens_setup_vdso_data().
*/
- if (timens_page)
+ if (timens_page) {
+ unsigned long addr;
+ vm_fault_t err;
+
+ /*
+ * Optimization: inside time namespace pre-fault
+ * VVAR page too. As on timens page there are only
+ * offsets for clocks on VVAR, it'll be faulted
+ * shortly by VDSO code.
+ */
+ addr = vmf->address + (image->sym_timens_page - sym_offset);
+ err = vmf_insert_pfn(vma, addr, pfn);
+ if (unlikely(err & VM_FAULT_ERROR))
+ return err;
+
pfn = page_to_pfn(timens_page);
+ }

return vmf_insert_pfn(vma, vmf->address, pfn);
} else if (sym_offset == image->sym_pvclock_page) {

2020-01-14 13:04:35

by tip-bot2 for Jacob Pan

[permalink] [raw]
Subject: [tip: timers/core] posix-clocks: Rename the clock_get() callback to clock_get_timespec()

The following commit has been merged into the timers/core branch of tip:

Commit-ID: 819a95fe3adfc7b558bfd96dd5ac589c4f543fd4
Gitweb: https://git.kernel.org/tip/819a95fe3adfc7b558bfd96dd5ac589c4f543fd4
Author: Andrei Vagin <[email protected]>
AuthorDate: Tue, 12 Nov 2019 01:26:54
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Tue, 14 Jan 2020 12:20:49 +01:00

posix-clocks: Rename the clock_get() callback to clock_get_timespec()

The upcoming support for time namespaces requires to have access to:

- The time in a task's time namespace for sys_clock_gettime()
- The time in the root name space for common_timer_get()

That adds a valid reason to finally implement a separate callback which
returns the time in ktime_t format, rather than in (struct timespec).

Rename the clock_get() callback to clock_get_timespec() as a preparation
for introducing clock_get_ktime().

Suggested-by: Thomas Gleixner <[email protected]>
Co-developed-by: Dmitry Safonov <[email protected]>
Signed-off-by: Andrei Vagin <[email protected]>
Signed-off-by: Dmitry Safonov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]


---
kernel/time/alarmtimer.c | 4 ++--
kernel/time/posix-clock.c | 8 ++++----
kernel/time/posix-cpu-timers.c | 32 ++++++++++++++++----------------
kernel/time/posix-timers.c | 22 +++++++++++-----------
kernel/time/posix-timers.h | 4 ++--
5 files changed, 35 insertions(+), 35 deletions(-)

diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c
index 451f9d0..8523df7 100644
--- a/kernel/time/alarmtimer.c
+++ b/kernel/time/alarmtimer.c
@@ -657,7 +657,7 @@ static int alarm_clock_getres(const clockid_t which_clock, struct timespec64 *tp
}

/**
- * alarm_clock_get - posix clock_get interface
+ * alarm_clock_get - posix clock_get_timespec interface
* @which_clock: clockid
* @tp: timespec to fill.
*
@@ -837,7 +837,7 @@ static int alarm_timer_nsleep(const clockid_t which_clock, int flags,

const struct k_clock alarm_clock = {
.clock_getres = alarm_clock_getres,
- .clock_get = alarm_clock_get,
+ .clock_get_timespec = alarm_clock_get,
.timer_create = alarm_timer_create,
.timer_set = common_timer_set,
.timer_del = common_timer_del,
diff --git a/kernel/time/posix-clock.c b/kernel/time/posix-clock.c
index 200fb2d..77c0c23 100644
--- a/kernel/time/posix-clock.c
+++ b/kernel/time/posix-clock.c
@@ -310,8 +310,8 @@ out:
}

const struct k_clock clock_posix_dynamic = {
- .clock_getres = pc_clock_getres,
- .clock_set = pc_clock_settime,
- .clock_get = pc_clock_gettime,
- .clock_adj = pc_clock_adjtime,
+ .clock_getres = pc_clock_getres,
+ .clock_set = pc_clock_settime,
+ .clock_get_timespec = pc_clock_gettime,
+ .clock_adj = pc_clock_adjtime,
};
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 42d512f..8ff6da7 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -1391,26 +1391,26 @@ static int thread_cpu_timer_create(struct k_itimer *timer)
}

const struct k_clock clock_posix_cpu = {
- .clock_getres = posix_cpu_clock_getres,
- .clock_set = posix_cpu_clock_set,
- .clock_get = posix_cpu_clock_get,
- .timer_create = posix_cpu_timer_create,
- .nsleep = posix_cpu_nsleep,
- .timer_set = posix_cpu_timer_set,
- .timer_del = posix_cpu_timer_del,
- .timer_get = posix_cpu_timer_get,
- .timer_rearm = posix_cpu_timer_rearm,
+ .clock_getres = posix_cpu_clock_getres,
+ .clock_set = posix_cpu_clock_set,
+ .clock_get_timespec = posix_cpu_clock_get,
+ .timer_create = posix_cpu_timer_create,
+ .nsleep = posix_cpu_nsleep,
+ .timer_set = posix_cpu_timer_set,
+ .timer_del = posix_cpu_timer_del,
+ .timer_get = posix_cpu_timer_get,
+ .timer_rearm = posix_cpu_timer_rearm,
};

const struct k_clock clock_process = {
- .clock_getres = process_cpu_clock_getres,
- .clock_get = process_cpu_clock_get,
- .timer_create = process_cpu_timer_create,
- .nsleep = process_cpu_nsleep,
+ .clock_getres = process_cpu_clock_getres,
+ .clock_get_timespec = process_cpu_clock_get,
+ .timer_create = process_cpu_timer_create,
+ .nsleep = process_cpu_nsleep,
};

const struct k_clock clock_thread = {
- .clock_getres = thread_cpu_clock_getres,
- .clock_get = thread_cpu_clock_get,
- .timer_create = thread_cpu_timer_create,
+ .clock_getres = thread_cpu_clock_getres,
+ .clock_get_timespec = thread_cpu_clock_get,
+ .timer_create = thread_cpu_timer_create,
};
diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
index 0ec5b7a..44d4f9c 100644
--- a/kernel/time/posix-timers.c
+++ b/kernel/time/posix-timers.c
@@ -667,7 +667,7 @@ void common_timer_get(struct k_itimer *timr, struct itimerspec64 *cur_setting)
* The timespec64 based conversion is suboptimal, but it's not
* worth to implement yet another callback.
*/
- kc->clock_get(timr->it_clock, &ts64);
+ kc->clock_get_timespec(timr->it_clock, &ts64);
now = timespec64_to_ktime(ts64);

/*
@@ -781,7 +781,7 @@ static void common_hrtimer_arm(struct k_itimer *timr, ktime_t expires,
* Posix magic: Relative CLOCK_REALTIME timers are not affected by
* clock modifications, so they become CLOCK_MONOTONIC based under the
* hood. See hrtimer_init(). Update timr->kclock, so the generic
- * functions which use timr->kclock->clock_get() work.
+ * functions which use timr->kclock->clock_get_timespec() work.
*
* Note: it_clock stays unmodified, because the next timer_set() might
* use ABSTIME, so it needs to switch back.
@@ -1067,7 +1067,7 @@ SYSCALL_DEFINE2(clock_gettime, const clockid_t, which_clock,
if (!kc)
return -EINVAL;

- error = kc->clock_get(which_clock, &kernel_tp);
+ error = kc->clock_get_timespec(which_clock, &kernel_tp);

if (!error && put_timespec64(&kernel_tp, tp))
error = -EFAULT;
@@ -1149,7 +1149,7 @@ SYSCALL_DEFINE2(clock_gettime32, clockid_t, which_clock,
if (!kc)
return -EINVAL;

- err = kc->clock_get(which_clock, &ts);
+ err = kc->clock_get_timespec(which_clock, &ts);

if (!err && put_old_timespec32(&ts, tp))
err = -EFAULT;
@@ -1261,7 +1261,7 @@ SYSCALL_DEFINE4(clock_nanosleep_time32, clockid_t, which_clock, int, flags,

static const struct k_clock clock_realtime = {
.clock_getres = posix_get_hrtimer_res,
- .clock_get = posix_clock_realtime_get,
+ .clock_get_timespec = posix_clock_realtime_get,
.clock_set = posix_clock_realtime_set,
.clock_adj = posix_clock_realtime_adj,
.nsleep = common_nsleep,
@@ -1279,7 +1279,7 @@ static const struct k_clock clock_realtime = {

static const struct k_clock clock_monotonic = {
.clock_getres = posix_get_hrtimer_res,
- .clock_get = posix_ktime_get_ts,
+ .clock_get_timespec = posix_ktime_get_ts,
.nsleep = common_nsleep,
.timer_create = common_timer_create,
.timer_set = common_timer_set,
@@ -1295,22 +1295,22 @@ static const struct k_clock clock_monotonic = {

static const struct k_clock clock_monotonic_raw = {
.clock_getres = posix_get_hrtimer_res,
- .clock_get = posix_get_monotonic_raw,
+ .clock_get_timespec = posix_get_monotonic_raw,
};

static const struct k_clock clock_realtime_coarse = {
.clock_getres = posix_get_coarse_res,
- .clock_get = posix_get_realtime_coarse,
+ .clock_get_timespec = posix_get_realtime_coarse,
};

static const struct k_clock clock_monotonic_coarse = {
.clock_getres = posix_get_coarse_res,
- .clock_get = posix_get_monotonic_coarse,
+ .clock_get_timespec = posix_get_monotonic_coarse,
};

static const struct k_clock clock_tai = {
.clock_getres = posix_get_hrtimer_res,
- .clock_get = posix_get_tai,
+ .clock_get_timespec = posix_get_tai,
.nsleep = common_nsleep,
.timer_create = common_timer_create,
.timer_set = common_timer_set,
@@ -1326,7 +1326,7 @@ static const struct k_clock clock_tai = {

static const struct k_clock clock_boottime = {
.clock_getres = posix_get_hrtimer_res,
- .clock_get = posix_get_boottime,
+ .clock_get_timespec = posix_get_boottime,
.nsleep = common_nsleep,
.timer_create = common_timer_create,
.timer_set = common_timer_set,
diff --git a/kernel/time/posix-timers.h b/kernel/time/posix-timers.h
index 897c29e..070611b 100644
--- a/kernel/time/posix-timers.h
+++ b/kernel/time/posix-timers.h
@@ -6,8 +6,8 @@ struct k_clock {
struct timespec64 *tp);
int (*clock_set)(const clockid_t which_clock,
const struct timespec64 *tp);
- int (*clock_get)(const clockid_t which_clock,
- struct timespec64 *tp);
+ int (*clock_get_timespec)(const clockid_t which_clock,
+ struct timespec64 *tp);
int (*clock_adj)(const clockid_t which_clock, struct __kernel_timex *tx);
int (*timer_create)(struct k_itimer *timer);
int (*nsleep)(const clockid_t which_clock, int flags,

2020-01-14 13:05:39

by tip-bot2 for Jacob Pan

[permalink] [raw]
Subject: [tip: timers/core] lib/vdso: Add unlikely() hint into vdso_read_begin()

The following commit has been merged into the timers/core branch of tip:

Commit-ID: 0898a16a362d436464b34fa644d0d46efc81df92
Gitweb: https://git.kernel.org/tip/0898a16a362d436464b34fa644d0d46efc81df92
Author: Andrei Vagin <[email protected]>
AuthorDate: Tue, 12 Nov 2019 01:26:50
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Tue, 14 Jan 2020 12:20:47 +01:00

lib/vdso: Add unlikely() hint into vdso_read_begin()

Place the branch with no concurrent write before the contended case.

Performance numbers for Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz
(more clock_gettime() cycles - the better):
| before | after
-----------------------------------
| 150252214 | 153242367
| 150301112 | 153324800
| 150392773 | 153125401
| 150373957 | 153399355
| 150303157 | 153489417
| 150365237 | 153494270
-----------------------------------
avg | 150331408 | 153345935
diff % | 2 | 0
-----------------------------------
stdev % | 0.3 | 0.1

Co-developed-by: Dmitry Safonov <[email protected]>
Signed-off-by: Andrei Vagin <[email protected]>
Signed-off-by: Dmitry Safonov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Vincenzo Frascino <[email protected]>
Reviewed-by: Vincenzo Frascino <[email protected]>
Link: https://lore.kernel.org/r/[email protected]


---
include/vdso/helpers.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/vdso/helpers.h b/include/vdso/helpers.h
index 01641db..9a2af9f 100644
--- a/include/vdso/helpers.h
+++ b/include/vdso/helpers.h
@@ -10,7 +10,7 @@ static __always_inline u32 vdso_read_begin(const struct vdso_data *vd)
{
u32 seq;

- while ((seq = READ_ONCE(vd->seq)) & 1)
+ while (unlikely((seq = READ_ONCE(vd->seq)) & 1))
cpu_relax();

smp_rmb();

2020-01-14 13:07:47

by tip-bot2 for Jacob Pan

[permalink] [raw]
Subject: [tip: timers/core] x86/vdso: On timens page fault prefault also VVAR page

The following commit has been merged into the timers/core branch of tip:

Commit-ID: e6b28ec65b6d433624a2c290073bc356c4fce914
Gitweb: https://git.kernel.org/tip/e6b28ec65b6d433624a2c290073bc356c4fce914
Author: Dmitry Safonov <[email protected]>
AuthorDate: Tue, 12 Nov 2019 01:27:14
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Tue, 14 Jan 2020 12:20:59 +01:00

x86/vdso: On timens page fault prefault also VVAR page

As timens page has offsets to data on VVAR page VVAR is going
to be accessed shortly. Set it up with timens in one page fault
as optimization.

Suggested-by: Thomas Gleixner <[email protected]>
Co-developed-by: Andrei Vagin <[email protected]>
Signed-off-by: Andrei Vagin <[email protected]>
Signed-off-by: Dmitry Safonov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]


---
arch/x86/entry/vdso/vma.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index e5f3361..d2fd8a5 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -170,8 +170,23 @@ static vm_fault_t vvar_fault(const struct vm_special_mapping *sm,
* offset.
* See also the comment near timens_setup_vdso_data().
*/
- if (timens_page)
+ if (timens_page) {
+ unsigned long addr;
+ vm_fault_t err;
+
+ /*
+ * Optimization: inside time namespace pre-fault
+ * VVAR page too. As on timens page there are only
+ * offsets for clocks on VVAR, it'll be faulted
+ * shortly by VDSO code.
+ */
+ addr = vmf->address + (image->sym_timens_page - sym_offset);
+ err = vmf_insert_pfn(vma, addr, pfn);
+ if (unlikely(err & VM_FAULT_ERROR))
+ return err;
+
pfn = page_to_pfn(timens_page);
+ }

return vmf_insert_pfn(vma, vmf->address, pfn);
} else if (sym_offset == image->sym_pvclock_page) {