2018-03-17 14:32:20

by Jason Vas Dias

[permalink] [raw]
Subject: [PATCH v4.16-rc5 (2)] x86/vdso: VDSO should handle clock_gettime(CLOCK_MONOTONIC_RAW) without syscall



Resent to address reviewer comments, and allow builds with compilers
that support -DRETPOLINE to succeed.

Currently, the VDSO does not handle
clock_gettime( CLOCK_MONOTONIC_RAW, &ts )
on Intel / AMD - it calls
vdso_fallback_gettime()
for this clock, which issues a syscall, having an unacceptably high
latency (minimum measurable time or time between measurements)
of 300-700ns on 2 2.8-3.9ghz Haswell x86_64 Family'_'Model : 06_3C
machines under various versions of Linux.

Sometimes, particularly when correlating elapsed time to performance
counter values, user-space code needs to know elapsed time from the
perspective of the CPU no matter how "hot" / fast or "cold" / slow it
might be running wrt NTP / PTP "real" time; when code needs this,
the latencies associated with a syscall are often unacceptably high.

I reported this as Bug #198161 :
'https://bugzilla.kernel.org/show_bug.cgi?id=198961'
and in previous posts with subjects matching 'CLOCK_MONOTONIC_RAW' .

This patch handles CLOCK_MONOTONIC_RAW clock_gettime() in the VDSO ,
by exporting the raw clock calibration, last cycles, last xtime_nsec,
and last raw_sec value in the vsyscall_gtod_data during vsyscall_update() .

Now the new do_monotonic_raw() function in the vDSO has a latency of @ 20ns
on average, and the test program:
tools/testing/selftest/timers/inconsistency-check.c
succeeds with arguments: '-c 4 -t 120' or any arbitrary -t value.

The patch is against Linus' latest 4.16-rc5 tree,
current HEAD of :
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
.

This patch affects only files:

arch/x86/include/asm/vgtod.h
arch/x86/entry/vdso/vclock_gettime.c
arch/x86/entry/vsyscall/vsyscall_gtod.c
arch/x86/entry/vdso/Makefile

Patches for kernels 3.10.0-21 and 4.9.65-rt23 (ARM) are attached to bug #198161,
as is the test program, timer_latency.c, to demonstrate the problem.

Before the patch a latency of 200-1000ns was measured for
clock_gettime(CLOCK_MONOTONIC_RAW,&ts)
calls - after the patch, the same call on the same machine
has a latency of @ 20ns.


Thanks & Best Regards,
Jason Vas Dias


2018-03-17 14:31:24

by Jason Vas Dias

[permalink] [raw]
Subject: [PATCH v4.16-rc5 1/2] x86/vdso: VDSO should handle clock_gettime(CLOCK_MONOTONIC_RAW) without syscall


This patch makes the vDSO handle clock_gettime(CLOCK_MONOTONIC_RAW,&ts)
calls in the same way it handles clock_gettime(CLOCK_MONOTONIC,&ts) calls,
reducing latency from @ 200-1000ns to @ 20ns.


diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c
index f19856d..843b0a6 100644
--- a/arch/x86/entry/vdso/vclock_gettime.c
+++ b/arch/x86/entry/vdso/vclock_gettime.c
@@ -182,27 +182,49 @@ notrace static u64 vread_tsc(void)
return last;
}

-notrace static inline u64 vgetsns(int *mode)
+notrace static inline __always_inline u64 vgetcycles(int *mode)
{
- u64 v;
- cycles_t cycles;
-
- if (gtod->vclock_mode == VCLOCK_TSC)
- cycles = vread_tsc();
+ switch (gtod->vclock_mode) {
+ case VCLOCK_TSC:
+ return vread_tsc();
#ifdef CONFIG_PARAVIRT_CLOCK
- else if (gtod->vclock_mode == VCLOCK_PVCLOCK)
- cycles = vread_pvclock(mode);
+ case VCLOCK_PVCLOCK:
+ return vread_pvclock(mode);
#endif
#ifdef CONFIG_HYPERV_TSCPAGE
- else if (gtod->vclock_mode == VCLOCK_HVCLOCK)
- cycles = vread_hvclock(mode);
+ case VCLOCK_HVCLOCK:
+ return vread_hvclock(mode);
#endif
- else
+ default:
+ break;
+ }
+ return 0;
+}
+
+notrace static inline u64 vgetsns(int *mode)
+{
+ u64 v;
+ cycles_t cycles = vgetcycles(mode);
+
+ if (cycles == 0)
return 0;
+
v = (cycles - gtod->cycle_last) & gtod->mask;
return v * gtod->mult;
}

+notrace static inline u64 vgetsns_raw(int *mode)
+{
+ u64 v;
+ cycles_t cycles = vgetcycles(mode);
+
+ if (cycles == 0)
+ return 0;
+
+ v = (cycles - gtod->cycle_last) & gtod->mask;
+ return v * gtod->raw_mult;
+}
+
/* Code size doesn't matter (vdso is 4k anyway) and this is faster. */
notrace static int __always_inline do_realtime(struct timespec *ts)
{
@@ -246,6 +268,27 @@ notrace static int __always_inline do_monotonic(struct timespec *ts)
return mode;
}

+notrace static __always_inline int do_monotonic_raw(struct timespec *ts)
+{
+ unsigned long seq;
+ u64 ns;
+ int mode;
+
+ do {
+ seq = gtod_read_begin(gtod);
+ mode = gtod->vclock_mode;
+ ts->tv_sec = gtod->monotonic_time_raw_sec;
+ ns = gtod->monotonic_time_raw_nsec;
+ ns += vgetsns_raw(&mode);
+ ns >>= gtod->raw_shift;
+ } while (unlikely(gtod_read_retry(gtod, seq)));
+
+ ts->tv_sec += __iter_div_u64_rem(ns, NSEC_PER_SEC, &ns);
+ ts->tv_nsec = ns;
+
+ return mode;
+}
+
notrace static void do_realtime_coarse(struct timespec *ts)
{
unsigned long seq;
@@ -277,6 +320,10 @@ notrace int __vdso_clock_gettime(clockid_t clock, struct timespec *ts)
if (do_monotonic(ts) == VCLOCK_NONE)
goto fallback;
break;
+ case CLOCK_MONOTONIC_RAW:
+ if (do_monotonic_raw(ts) == VCLOCK_NONE)
+ goto fallback;
+ break;
case CLOCK_REALTIME_COARSE:
do_realtime_coarse(ts);
break;
diff --git a/arch/x86/entry/vsyscall/vsyscall_gtod.c b/arch/x86/entry/vsyscall/vsyscall_gtod.c
index e1216dd..c4d89b6 100644
--- a/arch/x86/entry/vsyscall/vsyscall_gtod.c
+++ b/arch/x86/entry/vsyscall/vsyscall_gtod.c
@@ -44,6 +44,8 @@ void update_vsyscall(struct timekeeper *tk)
vdata->mask = tk->tkr_mono.mask;
vdata->mult = tk->tkr_mono.mult;
vdata->shift = tk->tkr_mono.shift;
+ vdata->raw_mult = tk->tkr_raw.mult;
+ vdata->raw_shift = tk->tkr_raw.shift;

vdata->wall_time_sec = tk->xtime_sec;
vdata->wall_time_snsec = tk->tkr_mono.xtime_nsec;
@@ -74,5 +76,8 @@ void update_vsyscall(struct timekeeper *tk)
vdata->monotonic_time_coarse_sec++;
}

+ vdata->monotonic_time_raw_sec = tk->raw_sec;
+ vdata->monotonic_time_raw_nsec = tk->tkr_raw.xtime_nsec;
+
gtod_write_end(vdata);
}
diff --git a/arch/x86/include/asm/vgtod.h b/arch/x86/include/asm/vgtod.h
index fb856c9..ec1a37c 100644
--- a/arch/x86/include/asm/vgtod.h
+++ b/arch/x86/include/asm/vgtod.h
@@ -22,7 +22,8 @@ struct vsyscall_gtod_data {
u64 mask;
u32 mult;
u32 shift;
-
+ u32 raw_mult;
+ u32 raw_shift;
/* open coded 'struct timespec' */
u64 wall_time_snsec;
gtod_long_t wall_time_sec;
@@ -32,6 +33,8 @@ struct vsyscall_gtod_data {
gtod_long_t wall_time_coarse_nsec;
gtod_long_t monotonic_time_coarse_sec;
gtod_long_t monotonic_time_coarse_nsec;
+ gtod_long_t monotonic_time_raw_sec;
+ gtod_long_t monotonic_time_raw_nsec;

int tz_minuteswest;
int tz_dsttime;

2018-03-17 14:31:34

by Jason Vas Dias

[permalink] [raw]
Subject: [PATCH v4.16-rc5 2/2] x86/vdso: VDSO should handle clock_gettime(CLOCK_MONOTONIC_RAW) without syscall

This patch allows compilation to succeed with compilers that support -DRETPOLINE -
it was kindly contributed by H.J. Liu in GCC Bugzilla: 84908 :
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84908

Apparently the GCC retpoline implementation has a limitation that it cannot
handle switch statements with more than 5 clauses, which vclock_gettime.c's
__vdso_clock_gettime function now contains.

The automated test builds should now succeed with this patch.


diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 1943aeb..cb64e10 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -76,7 +76,7 @@ CFL := $(PROFILING) -mcmodel=small -fPIC -O2 -fasynchronous-unwind-tables -m64 \
-fno-omit-frame-pointer -foptimize-sibling-calls \
-DDISABLE_BRANCH_PROFILING -DBUILD_VDSO

-$(vobjs): KBUILD_CFLAGS := $(filter-out $(GCC_PLUGINS_CFLAGS),$(KBUILD_CFLAGS)) $(CFL)
+$(vobjs): KBUILD_CFLAGS := $(filter-out $(GCC_PLUGINS_CFLAGS) $(RETPOLINE_CFLAGS) -DRETPOLINE,$(KBUILD_CFLAGS)) $(CFL)

#
# vDSO code runs in userspace and -pg doesn't help with profiling anyway.
@@ -143,6 +143,7 @@ KBUILD_CFLAGS_32 := $(filter-out -mcmodel=kernel,$(KBUILD_CFLAGS_32))
KBUILD_CFLAGS_32 := $(filter-out -fno-pic,$(KBUILD_CFLAGS_32))
KBUILD_CFLAGS_32 := $(filter-out -mfentry,$(KBUILD_CFLAGS_32))
KBUILD_CFLAGS_32 := $(filter-out $(GCC_PLUGINS_CFLAGS),$(KBUILD_CFLAGS_32))
+KBUILD_CFLAGS_32 := $(filter-out $(RETPOLINE_CFLAGS) -DRETPOLINE,$(KBUILD_CFLAGS_32))
KBUILD_CFLAGS_32 += -m32 -msoft-float -mregparm=0 -fpic
KBUILD_CFLAGS_32 += $(call cc-option, -fno-stack-protector)
KBUILD_CFLAGS_32 += $(call cc-option, -foptimize-sibling-calls)

2018-03-17 22:54:23

by Jason Vas Dias

[permalink] [raw]
Subject: re: [PATCH v4.16-rc5 (2)] x86/vdso: VDSO should handle clock_gettime(CLOCK_MONOTONIC_RAW) without syscall

Hi -

I submitted a new stripped-down to bare essentials version of
the patch, (see LKML emails with $subject) which passes all
checkpatch.pl tests and addresses all concerns raised by reviewers,
which uses only rdtsc_ordered(), and which only only updates in
vsyscall_gtod_data the new fields:
u32 raw_mult, raw_shift ; ...
gtod_long_t monotonic_time_raw_sec /* == tk->raw_sec */ ,
monotonic_time_raw_nsec /* == tk->tkr_raw.nsec */;
(this is NOT the formatting used in vgtod.h - sorry about previous
formatting issues .
) .

I don't see how one could present the raw timespec in user-space
properly without tk->tkr_raw.xtime_nsec and and tk->raw_sec ;
monotonic has gtod->monotonic_time_sec and gtod->monotonic_time_snsec,
and I am only trying to follow exactly the existing algorithm in
timekeeping.c's
getrawmonotonic64() .

When I submitted the initial version of this stripped down patch,
I got an email back from robot<[email protected]> reporting a compilation
error saying :

>
> arch/x86/entry/vdso/vclock_gettime.o: In function `__vdso_clock_gettime':
> vclock_gettime.c:(.text+0xf7): undefined reference to >`__x86_indirect_thunk_rax'
> /usr/bin/ld: arch/x86/entry/vdso/vclock_gettime.o: relocation R_X86_64_PC32 >against undefined symbol `__x86_indirect_thunk_rax' can not be used when making >a shared object; recompile with -fPIC
> /usr/bin/ld: final link failed: Bad value
>>> collect2: error: ld returned 1 exit status
>--
>>> arch/x86/entry/vdso/vdso32.so.dbg: undefined symbols found
>--
>>> objcopy: 'arch/x86/entry/vdso/vdso64.so.dbg': No such file
>---


I had fixed this problem with the patch to the RHEL kernel attached to
bug #198161 (attachment #274751:
https://bugzilla.kernel.org/attachment.cgi?id=274751) ,
by simply reducing the number of clauses in __vdso_clock_gettime's
switch(clock) from 6 to 5 , but at the cost of an extra test of clock
& second switch(clock).

I reported this as GCC bug :
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84908
because I don't think GCC should fail to do anything
for a switch with 6 clauses and not for one with 5, but
the response I got from H.J. Liu was:

H.J. Lu wrote @ 2018-03-16 22:13:27 UTC:
>
> vDSO isn't compiled with $(KBUILD_CFLAGS). Why does your kernel do it?
>
> Please try my kernel patch at comment 4..
>

So that patch to the arch/x86/vdso/Makefile only prevents it enabling the
RETPOLINE_CFLAGS for building the vDSO .

I defer to H.J.'s expertise on GCC + binutils & advisability of enabling
RETPOLINE_CFLAGS in the VDSO - GCC definitely behaves strangely
for the vDSO when RETPOLINE _CFLAGS are enabled.

Please provide something like the patch in a future version of Linux ,
and I suggest not compiling the vDSO with RETPOLINE_CFLAGS
as does H.J. .

The inconsistency_check program in tools/testing/selftests/timers produces
no errors for long runs and the timer_latency.c program (attached) also
produces no errors and latencies of @ 20ns for CLOCK_MONOTONIC_RAW
and latencies of @ 40ns for CLOCK_MONOTONIC - this is however
with the additional rdtscp patches , and under 4.15.9, for use on my system ;
the 4.16-rc5 version submitted still uses barrier() + rdtsc , and
that has a latency
of @ 30ns for CLOCK_MONOTONIC_RAW and @40ns for CLOCK_MONOTONIC ; but
both are much, much better that
200-1000ns for CLOCK_MONOTONIC_RAW that the unpatched
kernels have (all times refer to 'Average Latency' output produced
by 'timer_latency.c').

I do apologize for whitespace errors, unread emails and resends and confusion
of previous emails - I now understand the process and standards much better
and will attempt to adhere to them more closely in future.

Thanks & Best Regards,
Jason Vas Dias


Attachments:
timer_latency.c (3.52 kB)

2018-03-17 23:01:37

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH v4.16-rc5 2/2] x86/vdso: VDSO should handle clock_gettime(CLOCK_MONOTONIC_RAW) without syscall

On Sat, Mar 17, 2018 at 02:29:34PM +0000, [email protected] wrote:
> This patch allows compilation to succeed with compilers that support -DRETPOLINE -
> it was kindly contributed by H.J. Liu in GCC Bugzilla: 84908 :
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84908
>
> Apparently the GCC retpoline implementation has a limitation that it cannot
> handle switch statements with more than 5 clauses, which vclock_gettime.c's
> __vdso_clock_gettime function now conts.

That's quite a mischaracterization of the issue. gcc works as intended,
but the kernel did not correctly supply a indirect call retpoline thunk
to the vdso, and it just happened to work by accident with the old
vdso.

>
> The automated test builds should now succeed with this patch.

How about just adding the thunk function to the vdso object instead of
this cheap hack?

The other option would be to build vdso with inline thunks.

But just disabling is completely the wrong action.

-Andi


>
>
> diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
> index 1943aeb..cb64e10 100644
> --- a/arch/x86/entry/vdso/Makefile
> +++ b/arch/x86/entry/vdso/Makefile
> @@ -76,7 +76,7 @@ CFL := $(PROFILING) -mcmodel=small -fPIC -O2 -fasynchronous-unwind-tables -m64 \
> -fno-omit-frame-pointer -foptimize-sibling-calls \
> -DDISABLE_BRANCH_PROFILING -DBUILD_VDSO
>
> -$(vobjs): KBUILD_CFLAGS := $(filter-out $(GCC_PLUGINS_CFLAGS),$(KBUILD_CFLAGS)) $(CFL)
> +$(vobjs): KBUILD_CFLAGS := $(filter-out $(GCC_PLUGINS_CFLAGS) $(RETPOLINE_CFLAGS) -DRETPOLINE,$(KBUILD_CFLAGS)) $(CFL)
>
> #
> # vDSO code runs in userspace and -pg doesn't help with profiling anyway.
> @@ -143,6 +143,7 @@ KBUILD_CFLAGS_32 := $(filter-out -mcmodel=kernel,$(KBUILD_CFLAGS_32))
> KBUILD_CFLAGS_32 := $(filter-out -fno-pic,$(KBUILD_CFLAGS_32))
> KBUILD_CFLAGS_32 := $(filter-out -mfentry,$(KBUILD_CFLAGS_32))
> KBUILD_CFLAGS_32 := $(filter-out $(GCC_PLUGINS_CFLAGS),$(KBUILD_CFLAGS_32))
> +KBUILD_CFLAGS_32 := $(filter-out $(RETPOLINE_CFLAGS) -DRETPOLINE,$(KBUILD_CFLAGS_32))
> KBUILD_CFLAGS_32 += -m32 -msoft-float -mregparm=0 -fpic
> KBUILD_CFLAGS_32 += $(call cc-option, -fno-stack-protector)
> KBUILD_CFLAGS_32 += $(call cc-option, -foptimize-sibling-calls)
>

2018-03-17 23:40:07

by Jason Vas Dias

[permalink] [raw]
Subject: Re: [PATCH v4.16-rc5 (2)] x86/vdso: VDSO should handle clock_gettime(CLOCK_MONOTONIC_RAW) without syscall

fixed typo in timer_latency.c affecting only -r <repeat> printout
:

$ gcc -DN_SAMPLES=1000 -o timer timer_latency.c
CLOCK_MONOTONIC ( using rdtscp_ordered() ) :

$ ./timer -m -r 10
sum: 67615
Total time: 0.000067615S - Average Latency: 0.000000067S N zero
deltas: 0 N inconsistent deltas: 0
sum: 51858
Total time: 0.000051858S - Average Latency: 0.000000051S N zero
deltas: 0 N inconsistent deltas: 0
sum: 51742
Total time: 0.000051742S - Average Latency: 0.000000051S N zero
deltas: 0 N inconsistent deltas: 0
sum: 51944
Total time: 0.000051944S - Average Latency: 0.000000051S N zero
deltas: 0 N inconsistent deltas: 0
sum: 51838
Total time: 0.000051838S - Average Latency: 0.000000051S N zero
deltas: 0 N inconsistent deltas: 0
sum: 52397
Total time: 0.000052397S - Average Latency: 0.000000052S N zero
deltas: 0 N inconsistent deltas: 0
sum: 52428
Total time: 0.000052428S - Average Latency: 0.000000052S N zero
deltas: 0 N inconsistent deltas: 0
sum: 52135
Total time: 0.000052135S - Average Latency: 0.000000052S N zero
deltas: 0 N inconsistent deltas: 0
sum: 52145
Total time: 0.000052145S - Average Latency: 0.000000052S N zero
deltas: 0 N inconsistent deltas: 0
sum: 53116
Total time: 0.000053116S - Average Latency: 0.000000053S N zero
deltas: 0 N inconsistent deltas: 0
Average of 10 average latencies of 1000 samples : 0.000000053S


CLOCK_MONOTONIC_RAW ( using rdtscp() ) :

$ ./timer -r 10
sum: 25755
Total time: 0.000025755S - Average Latency: 0.000000025S N zero
deltas: 0 N inconsistent deltas: 0
sum: 21614
Total time: 0.000021614S - Average Latency: 0.000000021S N zero
deltas: 0 N inconsistent deltas: 0
sum: 21616
Total time: 0.000021616S - Average Latency: 0.000000021S N zero
deltas: 0 N inconsistent deltas: 0
sum: 21610
Total time: 0.000021610S - Average Latency: 0.000000021S N zero
deltas: 0 N inconsistent deltas: 0
sum: 21619
Total time: 0.000021619S - Average Latency: 0.000000021S N zero
deltas: 0 N inconsistent deltas: 0
sum: 21617
Total time: 0.000021617S - Average Latency: 0.000000021S N zero
deltas: 0 N inconsistent deltas: 0
sum: 21610
Total time: 0.000021610S - Average Latency: 0.000000021S N zero
deltas: 0 N inconsistent deltas: 0
sum: 16940
Total time: 0.000016940S - Average Latency: 0.000000016S N zero
deltas: 0 N inconsistent deltas: 0
sum: 16939
Total time: 0.000016939S - Average Latency: 0.000000016S N zero
deltas: 0 N inconsistent deltas: 0
sum: 16943
Total time: 0.000016943S - Average Latency: 0.000000016S N zero
deltas: 0 N inconsistent deltas: 0
Average of 10 average latencies of 1000 samples : 0.000000019S


Attachments:
timer_latency.c (3.51 kB)

2018-03-18 03:53:19

by Jason Vas Dias

[permalink] [raw]
Subject: Re: [PATCH v4.16-rc5 2/2] x86/vdso: VDSO should handle clock_gettime(CLOCK_MONOTONIC_RAW) without syscall

On 18/03/2018, Jason Vas Dias <[email protected]> wrote:
(should have CC'ed to list, sorry)
> On 17/03/2018, Andi Kleen <[email protected]> wrote:
>>
>> That's quite a mischaracterization of the issue. gcc works as intended,
>> but the kernel did not correctly supply a indirect call retpoline thunk
>> to the vdso, and it just happened to work by accident with the old
>> vdso.
>>
>>>
>>> The automated test builds should now succeed with this patch.
>>
>> How about just adding the thunk function to the vdso object instead of
>> this cheap hack?
>>
>> The other option would be to build vdso with inline thunks.
>>
>> But just disabling is completely the wrong action.
>>
>> -Andi
>>
>
> Aha! Thanks for the clarification , Andi!
>
> I will do so and resend the 2nd patch.
>
> But is everyone agreed we should accept any slowdown for the timer
> functions ? I personally don't think it is a good idea, but I will
> regenerate the patch with the thunk function and without
> the Makefile change.
>
> Thanks & Best Regards,
> Jason
>


I am wondering if it is not better to avoid the thunk being generated
and remove the Makefile patch ?

I know that changing the switch in __vdso_clock_gettime() like
this avoids the thunk :

switch(clock) {
case CLOCK_MONOTONIC:
if (do_monotonic(ts) == VCLOCK_NONE)
goto fallback;
break;
default:
switch (clock) {
case CLOCK_REALTIME:
if (do_realtime(ts) == VCLOCK_NONE)
goto fallback;
break;
case CLOCK_MONOTONIC_RAW:
if (do_monotonic_raw(ts) == VCLOCK_NONE)
goto fallback;
break;
case CLOCK_REALTIME_COARSE:
do_realtime_coarse(ts);
break;
case CLOCK_MONOTONIC_COARSE:
do_monotonic_coarse(ts);
break;
default:
goto fallback;
}
return 0;
fallback: ...
}


So at the cost of an unnecessary extra test of the clock parameter,
the thunk is avoided .

I wonder if the whole switch should be changed to an if / else clause ?

Or, I know this might be unorthodox, but might work :
#define _CAT(V1,V2) V1##V2
#define GTOD_CLK_LABEL(CLK) _CAT(_VCG_L_,CLK)
#define MAX_CLK 16
//^^ ??
__vdso_clock_gettime( ... ) { ...
static const void *clklbl_tab[MAX_CLK]
={ [ CLOCK_MONOTONIC ]
= &&GTOD_CLK_LABEL(CLOCK_MONOTONIC) ,
[ CLOCK_MONOTONIC_RAW ]
= &&GTOD_CLK_LABEL(CLOCK_MONOTONIC_RAW) ,
// and similarly for all clocks handled ...
};

goto clklbl_tab[ clock & 0xf ] ;

GTOD_CLK_LABEL(CLOCK_MONOTONIC) :
if ( do_monotonic(ts) == VCLOCK_NONE )
goto fallback ;

GTOD_CLK_LABEL(CLOCK_MONOTONIC_RAW) :
if ( do_monotonic_raw(ts) == VCLOCK_NONE )
goto fallback ;

... // similarly for all clocks

fallback:
return vdso_fallback_gettime(clock,ts);
}


If a restructuring like that might be acceptable (with correct tab-based
formatting) , and the VDSO can have such a table in its .BSS , I think it
would avoid the thunk, and have the advantage of
precomputing the jump table at compile-time, and would not require any
indirect branches, I think.

Any thoughts ?

Thanks & Best regards,
Jason




;

G

2018-03-19 21:44:14

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v4.16-rc5 1/2] x86/vdso: VDSO should handle clock_gettime(CLOCK_MONOTONIC_RAW) without syscall

Hi,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on v4.16-rc4]
[also build test ERROR on next-20180316]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/jason-vas-dias-gmail-com/x86-vdso-VDSO-should-handle-clock_gettime-CLOCK_MONOTONIC_RAW-without-syscall/20180319-021527
config: x86_64-randconfig-v0-03200333 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64

Note: the linux-review/jason-vas-dias-gmail-com/x86-vdso-VDSO-should-handle-clock_gettime-CLOCK_MONOTONIC_RAW-without-syscall/20180319-021527 HEAD e19190a6cf1c0e065fe6e3b1b3073e1586d0b6a9 builds fine.
It only hurts bisectibility.

All errors (new ones prefixed by >>):

arch/x86/entry/vdso/vclock_gettime.o: In function `__vdso_clock_gettime':
>> arch/x86/entry/vdso/vclock_gettime.c:314: undefined reference to `__x86_indirect_thunk_rax'
/usr/bin/ld: arch/x86/entry/vdso/vclock_gettime.o: relocation R_X86_64_PC32 against undefined symbol `__x86_indirect_thunk_rax' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: Bad value
>> collect2: error: ld returned 1 exit status
--
>> objcopy: 'arch/x86/entry/vdso/vdso64.so.dbg': No such file
--
arch/x86//entry/vdso/vclock_gettime.o: In function `__vdso_clock_gettime':
arch/x86//entry/vdso/vclock_gettime.c:314: undefined reference to `__x86_indirect_thunk_rax'
/usr/bin/ld: arch/x86//entry/vdso/vclock_gettime.o: relocation R_X86_64_PC32 against undefined symbol `__x86_indirect_thunk_rax' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: Bad value
>> collect2: error: ld returned 1 exit status

vim +314 arch/x86/entry/vdso/vclock_gettime.c

da15cfda arch/x86/vdso/vclock_gettime.c John Stultz 2009-08-19 311
23adec55 arch/x86/vdso/vclock_gettime.c Steven Rostedt 2008-05-12 312 notrace int __vdso_clock_gettime(clockid_t clock, struct timespec *ts)
2aae950b arch/x86_64/vdso/vclock_gettime.c Andi Kleen 2007-07-21 313 {
2aae950b arch/x86_64/vdso/vclock_gettime.c Andi Kleen 2007-07-21 @314 switch (clock) {
2aae950b arch/x86_64/vdso/vclock_gettime.c Andi Kleen 2007-07-21 315 case CLOCK_REALTIME:
ce39c640 arch/x86/vdso/vclock_gettime.c Stefani Seibold 2014-03-17 316 if (do_realtime(ts) == VCLOCK_NONE)
ce39c640 arch/x86/vdso/vclock_gettime.c Stefani Seibold 2014-03-17 317 goto fallback;
da15cfda arch/x86/vdso/vclock_gettime.c John Stultz 2009-08-19 318 break;
2aae950b arch/x86_64/vdso/vclock_gettime.c Andi Kleen 2007-07-21 319 case CLOCK_MONOTONIC:
ce39c640 arch/x86/vdso/vclock_gettime.c Stefani Seibold 2014-03-17 320 if (do_monotonic(ts) == VCLOCK_NONE)
ce39c640 arch/x86/vdso/vclock_gettime.c Stefani Seibold 2014-03-17 321 goto fallback;
da15cfda arch/x86/vdso/vclock_gettime.c John Stultz 2009-08-19 322 break;
d3dd2426 arch/x86/entry/vdso/vclock_gettime.c [email protected] 2018-03-17 323 case CLOCK_MONOTONIC_RAW:
d3dd2426 arch/x86/entry/vdso/vclock_gettime.c [email protected] 2018-03-17 324 if (do_monotonic_raw(ts) == VCLOCK_NONE)
d3dd2426 arch/x86/entry/vdso/vclock_gettime.c [email protected] 2018-03-17 325 goto fallback;
d3dd2426 arch/x86/entry/vdso/vclock_gettime.c [email protected] 2018-03-17 326 break;
da15cfda arch/x86/vdso/vclock_gettime.c John Stultz 2009-08-19 327 case CLOCK_REALTIME_COARSE:
ce39c640 arch/x86/vdso/vclock_gettime.c Stefani Seibold 2014-03-17 328 do_realtime_coarse(ts);
ce39c640 arch/x86/vdso/vclock_gettime.c Stefani Seibold 2014-03-17 329 break;
da15cfda arch/x86/vdso/vclock_gettime.c John Stultz 2009-08-19 330 case CLOCK_MONOTONIC_COARSE:
ce39c640 arch/x86/vdso/vclock_gettime.c Stefani Seibold 2014-03-17 331 do_monotonic_coarse(ts);
ce39c640 arch/x86/vdso/vclock_gettime.c Stefani Seibold 2014-03-17 332 break;
ce39c640 arch/x86/vdso/vclock_gettime.c Stefani Seibold 2014-03-17 333 default:
ce39c640 arch/x86/vdso/vclock_gettime.c Stefani Seibold 2014-03-17 334 goto fallback;
2aae950b arch/x86_64/vdso/vclock_gettime.c Andi Kleen 2007-07-21 335 }
0d7b8547 arch/x86/vdso/vclock_gettime.c Andy Lutomirski 2011-06-05 336
a939e817 arch/x86/vdso/vclock_gettime.c John Stultz 2012-03-01 337 return 0;
ce39c640 arch/x86/vdso/vclock_gettime.c Stefani Seibold 2014-03-17 338 fallback:
ce39c640 arch/x86/vdso/vclock_gettime.c Stefani Seibold 2014-03-17 339 return vdso_fallback_gettime(clock, ts);
2aae950b arch/x86_64/vdso/vclock_gettime.c Andi Kleen 2007-07-21 340 }
2aae950b arch/x86_64/vdso/vclock_gettime.c Andi Kleen 2007-07-21 341 int clock_gettime(clockid_t, struct timespec *)
2aae950b arch/x86_64/vdso/vclock_gettime.c Andi Kleen 2007-07-21 342 __attribute__((weak, alias("__vdso_clock_gettime")));
2aae950b arch/x86_64/vdso/vclock_gettime.c Andi Kleen 2007-07-21 343

:::::: The code at line 314 was first introduced by commit
:::::: 2aae950b21e4bc789d1fc6668faf67e8748300b7 x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu

:::::: TO: Andi Kleen <[email protected]>
:::::: CC: Linus Torvalds <[email protected]>

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (5.88 kB)
.config.gz (33.49 kB)
Download all attachments

2018-03-19 22:36:13

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v4.16-rc5 1/2] x86/vdso: VDSO should handle clock_gettime(CLOCK_MONOTONIC_RAW) without syscall

Hi,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on v4.16-rc4]
[also build test ERROR on next-20180319]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/jason-vas-dias-gmail-com/x86-vdso-VDSO-should-handle-clock_gettime-CLOCK_MONOTONIC_RAW-without-syscall/20180319-021527
config: x86_64-randconfig-in0-03200329 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64

Note: the linux-review/jason-vas-dias-gmail-com/x86-vdso-VDSO-should-handle-clock_gettime-CLOCK_MONOTONIC_RAW-without-syscall/20180319-021527 HEAD e19190a6cf1c0e065fe6e3b1b3073e1586d0b6a9 builds fine.
It only hurts bisectibility.

All errors (new ones prefixed by >>):

>> arch/x86/entry/vdso/vdso32.so.dbg: undefined symbols found

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (1.09 kB)
.config.gz (27.10 kB)
Download all attachments