2017-08-23 00:11:19

by Vallish Vaidyeshwara

[permalink] [raw]
Subject: [PATCH v2 0/2] enable hires timer to timeout datagram socket

v1->v2:
- Cong Wang pointed out MAX_SCHEDULE_TIMEOUT wraparound, fixed the
patch 1/2 to accommodate MAX_SCHEDULE_TIMEOUT wait time
- Changed format of printing total time from float to long in
selftests patch 2/2

Hello Dave,

Resending the patch series to include netdev mailing list with a
cover letter.

I am submitting 2 patch series to enable hires timer to timeout
datagram sockets (AF_UNIX & AF_INET domain) and test code to test
timeout accuracy on these sockets.

There has been a behavior change in 4.9 kernel with refactoring of Kernel
timer wheel in 4.8. We have a use case wherein our datagram socket
application is sensitive to socket timeout including long timeouts.

One of the test runs with a timeout value of 180 seconds timed out at
190 seconds.
[root@]# ./datagram_sock_timeout 180000
datagram_sock_timeout failed: took 190.00 seconds
[root@]#
The same program when run on a 4.4 kernel would timeout more accurately and
the kernel added slack was not noticeable to user application.

Patch 1: Has core code change of enabling hires timer to timeout datagram
socket on AF_UNIX and AF_INET domain
Patch 2: Test code to report regression in timeout behavior related to
patch 1

Vallish Vaidyeshwara (2):
net: enable high resolution timer mode to timeout datagram sockets
selftests/net: add test to verify datagram socket timeout

net/core/datagram.c | 14 ++-
tools/testing/selftests/net/Makefile | 3 +-
.../testing/selftests/net/datagram_sock_timeout.c | 119 +++++++++++++++++++++
.../selftests/net/run_datagram_sock_timeout.sh | 12 +++
4 files changed, 146 insertions(+), 2 deletions(-)
create mode 100644 tools/testing/selftests/net/datagram_sock_timeout.c
create mode 100755 tools/testing/selftests/net/run_datagram_sock_timeout.sh

--
2.7.3.AMZN


2017-08-23 00:10:40

by Vallish Vaidyeshwara

[permalink] [raw]
Subject: [PATCH v2 2/2] selftests/net: add test to verify datagram socket timeout

AF_UNIX and AF_INET datagram sockets use high resolution timer to time
SO_RCVTIMEO value used with setsockopt(2). This test checks for the
accuracy of kernel notifying these sockets timeout to application. Test
program has code to check AF_UNIX socket, however the kernel function used
to timeout AF_INET socket is the same kernel function used by AF_UNIX as
well which is __skb_wait_for_more_packets().

Reported-by: Manjula Peiris <[email protected]>
Reviewed-by: Eduardo Valentin <[email protected]>
Reviewed-by: Anchal Agarwal <[email protected]>
Signed-off-by: Vallish Vaidyeshwara <[email protected]>
---
tools/testing/selftests/net/Makefile | 3 +-
.../testing/selftests/net/datagram_sock_timeout.c | 119 +++++++++++++++++++++
.../selftests/net/run_datagram_sock_timeout.sh | 12 +++
3 files changed, 133 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/net/datagram_sock_timeout.c
create mode 100755 tools/testing/selftests/net/run_datagram_sock_timeout.sh

diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index f6c9dbf..eb5a8c7 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -3,11 +3,12 @@
CFLAGS = -Wall -Wl,--no-as-needed -O2 -g
CFLAGS += -I../../../../usr/include/

-TEST_PROGS := run_netsocktests run_afpackettests test_bpf.sh netdevice.sh
+TEST_PROGS := run_netsocktests run_afpackettests test_bpf.sh netdevice.sh run_datagram_sock_timeout.sh
TEST_GEN_FILES = socket
TEST_GEN_FILES += psock_fanout psock_tpacket
TEST_GEN_FILES += reuseport_bpf reuseport_bpf_cpu reuseport_bpf_numa
TEST_GEN_FILES += reuseport_dualstack
+TEST_GEN_FILES += datagram_sock_timeout

include ../lib.mk

diff --git a/tools/testing/selftests/net/datagram_sock_timeout.c b/tools/testing/selftests/net/datagram_sock_timeout.c
new file mode 100644
index 0000000..362becf
--- /dev/null
+++ b/tools/testing/selftests/net/datagram_sock_timeout.c
@@ -0,0 +1,119 @@
+/*
+ * Copyright 2017 Amazon.com, Inc.
+ * Author: Manjula Peiris <[email protected]>
+ * Vallish Vaidyeshwara <[email protected]>
+ *
+ * selftests/net: test to verify datagram socket timeout
+ *
+ * AF_UNIX and AF_INET datagram sockets use high resolution timer to time
+ * SO_RCVTIMEO value used with setsockopt(2). This test checks for the accuracy
+ * of kernel notifying these sockets timeout to application. Test program has
+ * code to check AF_UNIX socket, however the kernel function used to timeout
+ * AF_INET socket is the same kernel function used by AF_UNIX as well which is
+ * __skb_wait_for_more_packets().
+ *
+ * License (GPLv2):
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. * See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.
+ */
+
+#include <stdlib.h>
+#include <sys/socket.h>
+#include <unistd.h>
+#include <signal.h>
+#include <stdio.h>
+#include <fcntl.h>
+#include <err.h>
+#include <errno.h>
+#include <sys/un.h>
+#include <time.h>
+#include <assert.h>
+
+#define BUF_SIZE 128
+#define KB 1024
+#define NUM_FD 2
+
+static int set_socket_timeout(int sockfd, unsigned int ms)
+{
+ int ret;
+ struct timeval timeout;
+ socklen_t cb = sizeof(timeout);
+
+ timeout.tv_sec = ms / 1000;
+ timeout.tv_usec = (ms % 1000) * 1000;
+ ret = setsockopt(sockfd, SOL_SOCKET, SO_RCVTIMEO, &timeout, cb);
+ return ret;
+}
+
+int main(int argc, char **argv)
+{
+ char err[BUF_SIZE];
+ int ret;
+ int fds[NUM_FD];
+ struct msghdr message;
+ char buffer[KB];
+ struct sockaddr_storage src_addr;
+ struct iovec iov[1];
+ time_t start, end;
+ unsigned int timeout;
+
+ iov[0].iov_base = buffer;
+ iov[0].iov_len = sizeof(buffer);
+ message.msg_name = &src_addr;
+ message.msg_namelen = sizeof(src_addr);
+ message.msg_iov = iov;
+ message.msg_iovlen = 1;
+ message.msg_control = 0;
+ message.msg_controllen = 0;
+
+ if (argc != 2) {
+ fprintf(stderr,
+ "datagram_sock_timeout failed: no timeout specified\n");
+ return -1;
+ }
+ timeout = (unsigned int)(atoi(argv[1]));
+
+ if (socketpair(AF_UNIX, SOCK_DGRAM, 0, fds) != 0) {
+ strerror_r(errno, err, BUF_SIZE);
+ fprintf(stderr, "socketpair() call failed with %s\n", err);
+ return -1;
+ }
+
+ if (set_socket_timeout(fds[0], timeout) != 0) {
+ strerror_r(errno, err, BUF_SIZE);
+ fprintf(stderr, "setsockopt() call failed with %s\n", err);
+ return -1;
+ }
+
+ start = time(NULL);
+ ret = (int)recvmsg(fds[0], &message, 0);
+ end = time(NULL);
+ if (!(ret == -1 && errno == 11)) {
+ fprintf(stderr,
+ "datagram_sock_timeout failed: test was interrupted\n");
+ return -1;
+ }
+
+ if ((end - start) != (timeout / 1000)) {
+ fprintf(stderr,
+ "datagram_sock_timeout failed: took %lu seconds\n",
+ (long)(end - start));
+ return -1;
+ }
+
+ close(fds[0]);
+ close(fds[1]);
+
+ fprintf(stderr, "datagram_sock_timeout passed\n");
+ return 0;
+}
diff --git a/tools/testing/selftests/net/run_datagram_sock_timeout.sh b/tools/testing/selftests/net/run_datagram_sock_timeout.sh
new file mode 100755
index 0000000..d5f4f82
--- /dev/null
+++ b/tools/testing/selftests/net/run_datagram_sock_timeout.sh
@@ -0,0 +1,12 @@
+#!/bin/sh
+# Runs datagram socket timeout test
+
+echo "--------------------"
+echo "running run_datagram_sock_timeout test"
+echo "--------------------"
+./datagram_sock_timeout 180000
+if [ $? -ne 0 ]; then
+ echo "[FAIL]"
+else
+ echo "[PASS]"
+fi
--
2.7.3.AMZN

2017-08-23 00:11:11

by Vallish Vaidyeshwara

[permalink] [raw]
Subject: [PATCH v2 1/2] net: enable high resolution timer mode to timeout datagram sockets

Enable high resolution timer mode to time SO_RCVTIMEO value used with
setsockopt(2) on AF_UNIX and AF_INET datagram sockets. By default,
SO_RCVTIMEO uses low resolution timer which is good for most of socket
use cases.

Background:
Kernel timer wheel was refactored in 4.8 to avoid drawbacks with previous
implementation:
https://lwn.net/Articles/691064/
Unlike the previous "kernel timer wheel" implementation in 4.4 which aimed
for accuracy by paying cost for cascading tracked timers at the boundary of
256 jiffies, the new timer wheel implementation gets rid of cascading
latency by paying a price for being less accurate for far off timers.

Use Case:
New implementation is good for most of socket use cases. However we have a
use case where our application is sensitive to socket timeout including
long timeouts. Please refer to test code as part of this patch series.
One of the test runs with a timeout value of 180 seconds timed out at
190 seconds.
[root@]# ./datagram_sock_timeout 180000
datagram_sock_timeout failed: took 190.00 seconds
[root@]#
The same program when run on a 4.4 kernel would timeout more acurately and
the kernel added slack was not noticeable to user application.

Interesting text:
a) Standards for setsockopt:
http://pubs.opengroup.org/onlinepubs/009695399/functions/setsockopt.html
<snip>
SO_RCVTIMEO
Sets the timeout value that specifies the maximum amount of time an input
function waits until it completes. It accepts a timeval structure with the
number of seconds and microseconds specifying the limit on how long to wait
for an input operation to complete. If a receive operation has blocked for
this much time without receiving additional data, it shall return with a
partial count or errno set to [EAGAIN] or [EWOULDBLOCK] if no data is
received. The default for this option is zero, which indicates that a
receive operation shall not time out. This option takes a timeval
structure. Note that not all implementations allow this option to be set.
<end snip>
This only talks about the maximum time and the current behavior indeed
follows this standard. System call does not return before the time
specified and it does return EAGAIN.
b) Man page for SETSOCKOPT(3P):
<snip>
The option_name argument specifies a single option to set. It can be
one of the socket-level options defined in <sys_socket.h> and described in
Section 2.10.16, Use of Options. If option_name is equal to SO_RCVTIMEO
or SO_SNDTIMEO and the implementation supports setting the option, it is
unspecified whether the struct timeval pointed to by option_value is
stored as provided by this function or is rounded up to align with the
resolution of the clock being used. If setsockopt() is called with
option_name equal to SO_ACCEPTCONN, SO_ERROR, or SO_TYPE, the behavior is
unspecified.
<end snip>
Behavior is unspecified.
3) Man page for SELECT(2):
<snip>
Note that the timeout interval will be rounded up to the system
clock granularity, and kernel scheduling delays mean that the blocking
interval may overrun by a small amount. If both fields of the timeval
structure are zero, then select() returns immediately. (This is useful
for polling.) If timeout is NULL (no timeout), select() can block
indefinitely.
<end snip>
Select system call guarantees timeout interval and inturn uses highres
timer.

Reported-by: Manjula Peiris <[email protected]>
Reviewed-by: Eduardo Valentin <[email protected]>
Reviewed-by: Anchal Agarwal <[email protected]>
Signed-off-by: Vallish Vaidyeshwara <[email protected]>
---
net/core/datagram.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/net/core/datagram.c b/net/core/datagram.c
index ee5647b..832c147 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -80,6 +80,7 @@ static int receiver_wake_function(wait_queue_entry_t *wait, unsigned int mode, i
return 0;
return autoremove_wake_function(wait, mode, sync, key);
}
+
/*
* Wait for the last received packet to be different from skb
*/
@@ -87,6 +88,9 @@ int __skb_wait_for_more_packets(struct sock *sk, int *err, long *timeo_p,
const struct sk_buff *skb)
{
int error;
+ ktime_t expires;
+ struct timespec64 time;
+ unsigned long pre_sched_time;
DEFINE_WAIT_FUNC(wait, receiver_wake_function);

prepare_to_wait_exclusive(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
@@ -116,7 +120,15 @@ int __skb_wait_for_more_packets(struct sock *sk, int *err, long *timeo_p,
goto interrupted;

error = 0;
- *timeo_p = schedule_timeout(*timeo_p);
+ /* Wait using highres timer */
+ time.tv_sec = *timeo_p / HZ;
+ time.tv_nsec = jiffies_to_nsecs(*timeo_p % HZ);
+ expires = ktime_add_safe(ktime_get(), timespec64_to_ktime(time));
+ pre_sched_time = jiffies;
+ if (schedule_hrtimeout(&expires, HRTIMER_MODE_ABS))
+ *timeo_p = jiffies - pre_sched_time;
+ else
+ *timeo_p = 0;
out:
finish_wait(sk_sleep(sk), &wait);
return error;
--
2.7.3.AMZN

2017-08-23 04:30:32

by David Miller

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] enable hires timer to timeout datagram socket

From: Vallish Vaidyeshwara <[email protected]>
Date: Wed, 23 Aug 2017 00:10:25 +0000

> I am submitting 2 patch series to enable hires timer to timeout
> datagram sockets (AF_UNIX & AF_INET domain) and test code to test
> timeout accuracy on these sockets.

This is not reasonable.

If you want high resolution events with real guarantees, please use
the kernel interfaces which provide this as explained to you as
feedback by other reviewers.

I'm not applying this, sorry.

2017-08-27 20:47:37

by Vallish Vaidyeshwara

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] enable hires timer to timeout datagram socket

On Tue, Aug 22, 2017 at 09:30:30PM -0700, David Miller wrote:
> From: Vallish Vaidyeshwara <[email protected]>
> Date: Wed, 23 Aug 2017 00:10:25 +0000
>
> > I am submitting 2 patch series to enable hires timer to timeout
> > datagram sockets (AF_UNIX & AF_INET domain) and test code to test
> > timeout accuracy on these sockets.
>
> This is not reasonable.
>
> If you want high resolution events with real guarantees, please use
> the kernel interfaces which provide this as explained to you as
> feedback by other reviewers.
>
> I'm not applying this, sorry.

Hello David,

I respect the decision not to upstream this patch series, however I
wanted to provide additional details. Application wanting high
resolution events with real guarantees is not the case, but the case
here is regression in system call behavior:

1) Change in system call behavior:
strace from 4.4 test run of waiting for 180 seconds on datagram socket:
10:25:48.239685 setsockopt(3, SOL_SOCKET, SO_RCVTIMEO, "\264\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 16) = 0
10:25:48.239755 recvmsg(3, 0x7ffd0a3beec0, 0) = -1 EAGAIN (Resource temporarily unavailable)
10:28:48.236989 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0

strace from 4.9 test run of waiting for 180 seconds on datagram socket times out close to 195 seconds:
setsockopt(3, SOL_SOCKET, SO_RCVTIMEO, "\264\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 16) = 0 <0.000028>
recvmsg(3, 0x7ffd6a2c4380, 0)?????????? = -1 EAGAIN (Resource temporarily unavailable) <194.852000>
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0 <0.000018>

This is the change in behavior of system call that is causing our application
to regress on 4.9 kernel. There are events which need to be run on timeouts
and now response time for such timeouts on 4.9 kernel are being triggered
with extended delay of close to 195 seconds as in one of the test runs
shown above.

2) Comparison with MacOS:
I ran the same test on OS X El Capitan version 10.11.6 and the behavior is
consistent with Linux 4.4 Kernel behavior. I have not tested the program on
other flavors of OS like HPUX or AIX or Solaris, but I guess if these OS
implement SO_RCVTIMEO and tested, this behavior will not be different than
Linux 4.4 kernel.
??
3) Standards Specification:
Opengroups standard does not talk about how quick SO_RCVTIMEO need to respond
for timeouts. However, the standards for select system call do mention that
timeout need to respond quickly. It would be good to restore SO_RCVTIMEO
behavior to 4.4 kernel and have SO_RCVTIMEO be consistent with select timeout.

4) Changing application code:
Any change to application code to accommodate this change of behavior in system
call breaks application migration between 4.4 kernel and 4.9 kernel.
Moreover, making application code change is not feasible in all cases as in
the case where the source code is not available (third party vendor).

Thanks.
-Vallish

2017-08-29 11:17:01

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v2 0/2] enable hires timer to timeout datagram socket

From: Vallish Vaidyeshwara
> Sent: 27 August 2017 21:47
...
> I respect the decision not to upstream this patch series, however I
> wanted to provide additional details. Application wanting high
> resolution events with real guarantees is not the case, but the case
> here is regression in system call behavior:
>
> 1) Change in system call behavior:
> strace from 4.4 test run of waiting for 180 seconds on datagram socket:
> 10:25:48.239685 setsockopt(3, SOL_SOCKET, SO_RCVTIMEO, "\264\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 16) = 0
> 10:25:48.239755 recvmsg(3, 0x7ffd0a3beec0, 0) = -1 EAGAIN (Resource temporarily unavailable)
> 10:28:48.236989 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0

IMHO you should be complaining to the main kernel group who 'broke' the
timer code, not the networking group.

David

2017-09-08 17:05:51

by Eduardo Valentin

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] enable hires timer to timeout datagram socket

David,

On Tue, Aug 22, 2017 at 09:30:30PM -0700, David Miller wrote:
> From: Vallish Vaidyeshwara <[email protected]>
> Date: Wed, 23 Aug 2017 00:10:25 +0000
>
> > I am submitting 2 patch series to enable hires timer to timeout
> > datagram sockets (AF_UNIX & AF_INET domain) and test code to test
> > timeout accuracy on these sockets.
>
> This is not reasonable.
>
> If you want high resolution events with real guarantees, please use
> the kernel interfaces which provide this as explained to you as
> feedback by other reviewers.

I understand the kernel provides other interfaces.

>
> I'm not applying this, sorry.

However, this is a clear, the system call, from the net subsystem, has changed in behavior across kernel versions. From application / userspace perspective, changing the system call without clear documentation or deprecation path, to me, looks like breaking userspace, isn't it?

If the correct recommendation is to use different system calls this should have been mentioned in system call documentation before changing its behavior, not expect the user to figure out after kernel release/upgrade, right?


--
All the best,
Eduardo Valentin

2017-09-08 17:17:01

by David Miller

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] enable hires timer to timeout datagram socket

From: Eduardo Valentin <[email protected]>
Date: Fri, 8 Sep 2017 10:04:09 -0700

> However, this is a clear, the system call, from the net subsystem,
> has changed in behavior across kernel versions. From application /
> userspace perspective, changing the system call without clear
> documentation or deprecation path, to me, looks like breaking
> userspace, isn't it?

Where is the chapter and verse of the system call documentation that
guaranteed this level of timer granularity for you?

Or were you simply relying upon implementation dependent behavior?
I can't see anything which ever guarateed the granularity of timers
to the extent upon which you were relying.

And most importantly, letting the kernel have flexibility in this area
is absolutely essential for various forms of optimizations and power
savings.

2017-09-08 17:23:36

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] enable hires timer to timeout datagram socket

On Fri, 2017-09-08 at 10:16 -0700, David Miller wrote:
> From: Eduardo Valentin <[email protected]>
> Date: Fri, 8 Sep 2017 10:04:09 -0700
>
> >
> > However, this is a clear, the system call, from the net subsystem,
> > has changed in behavior across kernel versions. From application /
> > userspace perspective, changing the system call without clear
> > documentation or deprecation path, to me, looks like breaking
> > userspace, isn't it?
>
> Where is the chapter and verse of the system call documentation that
> guaranteed this level of timer granularity for you?
>
> Or were you simply relying upon implementation dependent behavior?
> I can't see anything which ever guarateed the granularity of timers
> to the extent upon which you were relying.
>
> And most importantly, letting the kernel have flexibility in this area
> is absolutely essential for various forms of optimizations and power
> savings.

The rule we normally use, typically enforced very shoutily by Linus, is
that *however* stupid userspace was to rely on something, if they *do*
rely on it then we shouldn't change it.

I don't know that anyone's ever tried saying "show me the chapter and
verse of the documentation" to Linus when he's in full rant mode, as he
tends to get in such discussions. You could try it, I suppose.

I don't think 'HZ==100' was documented per se either, was it? Perhaps
we *could* change that, after all? :)

(Not that I've actually looked at the patch or the userspace in
question yet, mind you. Just commenting on the absurdity of the
response.)


Attachments:
smime.p7s (4.82 kB)

2017-09-08 17:26:47

by David Miller

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] enable hires timer to timeout datagram socket

From: David Woodhouse <[email protected]>
Date: Fri, 08 Sep 2017 18:23:22 +0100

> I don't know that anyone's ever tried saying "show me the chapter and
> verse of the documentation"

Do you know why I brought this up? Because the person I am replying
to told me that the syscall documentation should have suggested this
or that.

That's why.

So let's concentrate on the other aspects of my reply, ok?

2017-09-08 18:55:35

by Eduardo Valentin

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] enable hires timer to timeout datagram socket

Hello,

On Fri, Sep 08, 2017 at 10:26:45AM -0700, David Miller wrote:
> From: David Woodhouse <[email protected]>
> Date: Fri, 08 Sep 2017 18:23:22 +0100
>
> > I don't know that anyone's ever tried saying "show me the chapter and
> > verse of the documentation"
>
> Do you know why I brought this up? Because the person I am replying
> to told me that the syscall documentation should have suggested this
> or that.
>
> That's why.

:-) My intention was for sure not to upset anybody.

Just to reiterate, the point of patch is simple, there was a change in behavior in the system call from one kernel version to the other. As I mentioned, I agree that the userspace could use other means to achieve the same, but still the system call behavior has changed.

>
> So let's concentrate on the other aspects of my reply, ok?

I agree. I would prefer to understand here what is the technical reason not to accept these patches other than "use other system calls".


--
All the best,
Eduardo Valentin

2017-09-08 19:11:53

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] enable hires timer to timeout datagram socket

On Fri, 2017-09-08 at 11:55 -0700, Eduardo Valentin wrote:
> Hello,
>
> On Fri, Sep 08, 2017 at 10:26:45AM -0700, David Miller wrote:
> > From: David Woodhouse <[email protected]>
> > Date: Fri, 08 Sep 2017 18:23:22 +0100
> >
> > > I don't know that anyone's ever tried saying "show me the chapter
> and
> > > verse of the documentation"
> >
> > Do you know why I brought this up? Because the person I am replying
> > to told me that the syscall documentation should have suggested this
> > or that.
> >
> > That's why.
>
> :-) My intention was for sure not to upset anybody.
>
> Just to reiterate, the point of patch is simple, there was a change in
> behavior in the system call from one kernel version to the other. As I
> mentioned, I agree that the userspace could use other means to achieve
> the same, but still the system call behavior has changed.
>
> >
> > So let's concentrate on the other aspects of my reply, ok?
>
> I agree. I would prefer to understand here what is the technical
> reason not to accept these patches other than "use other system
> calls".

So if we need to replace all 'legacy' timers to high resolution timer,
because some application was _relying_ on jiffies being kind of precise,
maybe it is better to revert the change done on legacy timers.

Or continue the migration and make them use high res internally.

select() and poll() are the standard way to have precise timeouts,
it is silly we have to maintain a timeout handling in the datagram fast
path.



2017-09-08 21:45:00

by David Miller

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] enable hires timer to timeout datagram socket

From: Eduardo Valentin <[email protected]>
Date: Fri, 8 Sep 2017 11:55:21 -0700

> I agree. I would prefer to understand here what is the technical
> reason not to accept these patches other than "use other system
> calls".

I explained this, let me reiterate:

====================
And most importantly, letting the kernel have flexibility in this area
is absolutely essential for various forms of optimizations and power
savings.
====================

2017-09-16 09:48:10

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] enable hires timer to timeout datagram socket

On Fri, 8 Sep 2017, Eric Dumazet wrote:
> On Fri, 2017-09-08 at 11:55 -0700, Eduardo Valentin wrote:
> > Hello,
> >
> > On Fri, Sep 08, 2017 at 10:26:45AM -0700, David Miller wrote:
> > > From: David Woodhouse <[email protected]>
> > > Date: Fri, 08 Sep 2017 18:23:22 +0100
> > >
> > > > I don't know that anyone's ever tried saying "show me the chapter
> > and
> > > > verse of the documentation"
> > >
> > > Do you know why I brought this up? Because the person I am replying
> > > to told me that the syscall documentation should have suggested this
> > > or that.
> > >
> > > That's why.
> >
> > :-) My intention was for sure not to upset anybody.
> >
> > Just to reiterate, the point of patch is simple, there was a change in
> > behavior in the system call from one kernel version to the other. As I
> > mentioned, I agree that the userspace could use other means to achieve
> > the same, but still the system call behavior has changed.
> >
> > >
> > > So let's concentrate on the other aspects of my reply, ok?
> >
> > I agree. I would prefer to understand here what is the technical
> > reason not to accept these patches other than "use other system
> > calls".
>
> So if we need to replace all 'legacy' timers to high resolution timer,
> because some application was _relying_ on jiffies being kind of precise,
> maybe it is better to revert the change done on legacy timers.

Which would be a major step back in terms of timer performance and system
disturbance caused by massive recascading operations.

> Or continue the migration and make them use high res internally.
>
> select() and poll() are the standard way to have precise timeouts,
> it is silly we have to maintain a timeout handling in the datagram fast
> path.

A few years ago we switched select/poll over to use hrtimers because the
wheel timers were too inaccurate for some operations, so it feels
consequent to switch the timeout in the datagram rcv path over as well. I
agree that the whole timeout magic there feels silly, but unfortunately
it's a documented property of sockets.

Thanks,

tglx

2017-09-20 22:48:26

by Vallish Vaidyeshwara

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] enable hires timer to timeout datagram socket

On Sat, Sep 16, 2017 at 11:47:56AM +0200, Thomas Gleixner wrote:
> On Fri, 8 Sep 2017, Eric Dumazet wrote:
> > On Fri, 2017-09-08 at 11:55 -0700, Eduardo Valentin wrote:
> > > Hello,
> > >
> > > On Fri, Sep 08, 2017 at 10:26:45AM -0700, David Miller wrote:
> > > > From: David Woodhouse <[email protected]>
> > > > Date: Fri, 08 Sep 2017 18:23:22 +0100
> > > >
> > > > > I don't know that anyone's ever tried saying "show me the chapter
> > > and
> > > > > verse of the documentation"
> > > >
> > > > Do you know why I brought this up? Because the person I am replying
> > > > to told me that the syscall documentation should have suggested this
> > > > or that.
> > > >
> > > > That's why.
> > >
> > > :-) My intention was for sure not to upset anybody.
> > >
> > > Just to reiterate, the point of patch is simple, there was a change in
> > > behavior in the system call from one kernel version to the other. As I
> > > mentioned, I agree that the userspace could use other means to achieve
> > > the same, but still the system call behavior has changed.
> > >
> > > >
> > > > So let's concentrate on the other aspects of my reply, ok?
> > >
> > > I agree. I would prefer to understand here what is the technical
> > > reason not to accept these patches other than "use other system
> > > calls".
> >
> > So if we need to replace all 'legacy' timers to high resolution timer,
> > because some application was _relying_ on jiffies being kind of precise,
> > maybe it is better to revert the change done on legacy timers.
>
> Which would be a major step back in terms of timer performance and system
> disturbance caused by massive recascading operations.
>
> > Or continue the migration and make them use high res internally.
> >
> > select() and poll() are the standard way to have precise timeouts,
> > it is silly we have to maintain a timeout handling in the datagram fast
> > path.
>
> A few years ago we switched select/poll over to use hrtimers because the
> wheel timers were too inaccurate for some operations, so it feels
> consequent to switch the timeout in the datagram rcv path over as well. I
> agree that the whole timeout magic there feels silly, but unfortunately
> it's a documented property of sockets.
>
> Thanks,
>
> tglx
>

Hello Thomas,

Thanks for your comments. This patch has been NACK'ed by David Miller. Is
there any other approach to solve this problem with out application code
being recompiled?

Thanks.
-Vallish

2017-09-25 20:42:37

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] enable hires timer to timeout datagram socket

On Wed, 20 Sep 2017, Vallish Vaidyeshwara wrote:
> On Sat, Sep 16, 2017 at 11:47:56AM +0200, Thomas Gleixner wrote:
> > > So if we need to replace all 'legacy' timers to high resolution timer,
> > > because some application was _relying_ on jiffies being kind of precise,
> > > maybe it is better to revert the change done on legacy timers.
> >
> > Which would be a major step back in terms of timer performance and system
> > disturbance caused by massive recascading operations.
> >
> > > Or continue the migration and make them use high res internally.
> > >
> > > select() and poll() are the standard way to have precise timeouts,
> > > it is silly we have to maintain a timeout handling in the datagram fast
> > > path.
> >
> > A few years ago we switched select/poll over to use hrtimers because the
> > wheel timers were too inaccurate for some operations, so it feels
> > consequent to switch the timeout in the datagram rcv path over as well. I
> > agree that the whole timeout magic there feels silly, but unfortunately
> > it's a documented property of sockets.
> >
>
> Thanks for your comments. This patch has been NACK'ed by David Miller. Is
> there any other approach to solve this problem with out application code
> being recompiled?

We have only three options here:

1) Do a massive revert of the timer wheel changes and lose all the
benefits of that rework.

2) Make that timer list -> hrtimer change in the datagram code

3) Ignore it

#1 Would be pretty ironic as networking would take the biggest penalty of
the revert.

#2 Is IMO the proper solution as it cures a user space visible regression,
though the patch itself could be made way simpler

#3 Shrug

Dave, Eric?

Thanks,

tglx