Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756886AbbDVLJO (ORCPT ); Wed, 22 Apr 2015 07:09:14 -0400 Received: from mout.kundenserver.de ([212.227.126.131]:57951 "EHLO mout.kundenserver.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755806AbbDVLJJ (ORCPT ); Wed, 22 Apr 2015 07:09:09 -0400 From: Arnd Bergmann To: y2038@lists.linaro.org Cc: Thomas Gleixner , pang.xunlei@linaro.org, Peter Zijlstra , Benjamin Herrenschmidt , Heiko Carstens , Paul Mackerras , cl@linux.com, Ingo Molnar , heenasirwani@gmail.com, linux-arch@vger.kernel.org, linux-s390@vger.kernel.org, mpe@ellerman.id.au, rafael.j.wysocki@intel.com, ahh@google.com, Frederic Weisbecker , "Paul E. McKenney" , pjt@google.com, riel@redhat.com, richardcochran@gmail.com, Tejun Heo , John Stultz , rth@twiddle.net, Baolin Wang , gregkh@linuxfoundation.org, LKML , netdev@vger.kernel.org, Martin Schwidefsky , linux390@de.ibm.com, linuxppc-dev@lists.ozlabs.org Subject: Re: [Y2038] [PATCH 04/11] posix timers:Introduce the 64bit methods with timespec64 type for k_clock structure Date: Wed, 22 Apr 2015 13:07:44 +0200 Message-ID: <2233518.Z2Q4dpO62C@wuerfel> User-Agent: KMail/4.11.5 (Linux/3.16.0-10-generic; KDE/4.11.5; x86_64; ; ) In-Reply-To: References: <1429509459-17068-1-git-send-email-baolin.wang@linaro.org> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Provags-ID: V03:K0:YOLqI8K0C7bKK4SLV9zzZSKw+RLjZpbR27C5CGlre5vV1IsEr2J 7wrJA+KuUaTVoi9pcBGBLXgv7ZOOT7vRt+xlrZt2brzE/Mbs2Xs2q6wht4KSiJQbO1ticRq TYNv5ce+Aa7Xqth0gRNvQmBmUbaU9QMCwTOxzr+HGI4BfF8tWS3TDkulQz+bNU/RS2qu1LZ LOI5hkx/lk+9npCXVkhHw== X-UI-Out-Filterresults: notjunk:1; Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4421 Lines: 103 On Wednesday 22 April 2015 10:45:23 Thomas Gleixner wrote: > On Tue, 21 Apr 2015, Thomas Gleixner wrote: > So we could save one translation step if we implement new syscalls > which have a scalar nsec interface instead of the timespec/timeval > cruft and let user space do the translation to whatever it wants. > > So > > sys_clock_nanosleep(const clockid_t which_clock, int flags, > const struct timespec __user *expires, > struct timespec __user *reminder) > > would get the new syscall variant: > > sys_clock_nanosleep_ns(const clockid_t which_clock, int flags, > const s64 expires, s64 __user *reminder) As you might expect, there are a number of complications with this approach: - John Stultz likes to point out that it's easier to do one change at a time, so extending the interface to 64-bit has less potential of breaking things than a more fundamental change. I think it's useful to drop a lot of the syscalls when a more modern version is around (e.g. let libc implement usleep and nanosleep through clock_nanosleep), but keep the syscalls as close to the known-working 64-bit versions as we can. - The inode timestamp related syscalls (stat, utimes and variants thereof) require the full range of time64_t and cannot use ktime_t. - converting between timespec types of different size is cheap, converting timespec to ktime_t is still relatively cheap, but converting ktime_t to timespec is rather expensive (at least eight 32-bit multiplies, plus a few shifts and additions if you don't have 64-bit arithmetic). - ioctls that pass a timespec need to keep doing that or would require a source-level change in user space instead of recompiling. > I personally would welcome such an interface as it makes user space > programming simpler. Just (re)arming a periodic nanosleep based on > absolute expiry time is horrible stupid today: > > struct timespec expires; > .... > while () > expires.tv_nsec += period.tv_nsec; > expires.tv_sec += period.tv_sec; > normalize_timespec(&expires); > sys_clock_nanosleep(CLOCK_ID, ABS, &expires, NULL); > > So with a scalar interface this would reduce to: > > s64 expires; > .... > while () > expires += period; > sys_clock_nanosleep_ns(CLOCK_ID, ABS, &expires, NULL); > > There is a difference both in text and storage size plus the avoidance > of the two translation steps (one translation step on 64bit). We should probably look at it separately for each syscall. It's quite possible that we find a number of them for which it helps and others for which it hurts, so we need to see the big pictures. There are also a few other calls that will never need 64-bit time_t because the range is limited by the need to only ever pass relative timeouts (select, poll, io_getevents, recvmmsg, clock_getres, rt_sigtimedwait, sched_rr_get_interval, getrusage, waitid, semtimedop, sysinfo), so we could actually leave them using a 32-bit structure and have the libc do the conversion. > I know that this is non portable, but OTOH if I look at the non > portable mechanisms which are used by data bases, java VMs and other > apps which exist to squeeze the last cycles out of the system, there > is certainly some value to that. > > The portable/spec conforming apps can still use the user space > assisted translated timespec/timeval mechanisms. > > There is one caveat though: sys_clock_gettime and sys_gettimeofday > will still need a syscall_timespec64 variant. We have no double > translation steps there because we maintain the timespec > representation in the timekeeping code for performance reasons to > avoid the division in the syscall interface. But everything else can > do nicely without the timespec cruft. > > We really should talk to libc folks and high performance users about > this before blindly adding a gazillion of new timespec64 based > interfaces. I've started a list of affected syscalls at https://docs.google.com/spreadsheets/d/1HCYwHXxs48TsTb6IGUduNjQnmfRvMPzCN6T_0YiQwis/edit?usp=sharing Still adding more calls and description, let me know if you want edit permissions. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/