2013-06-11 14:24:32

by Eliezer Tamir

[permalink] [raw]
Subject: [PATCH net-next 0/2] net: low latency sockets follow ups

David,

Here are two more patches.

Patch 1 removes the config menu for NET_LL_RX_POLL and defaults to y.
Patch 2 adds an SO_LL socket option to allow per-socket control of busy polling.

The patches do not depend on each other.
If one of them needs more work, please consider applying the other.

I will reply to this thread with a patch to sockperf that enables using
the socket option.

Thanks,
Eliezer


2013-06-11 14:24:36

by Eliezer Tamir

[permalink] [raw]
Subject: [PATCH net-next 1/2] net: remove NET_LL_RX_POLL config menue

Remove NET_LL_RX_POLL from the config menu.
Change default to y.
Busy polling still needs to be enabled at runtime via sysctl.

Signed-off-by: Eliezer Tamir <[email protected]>
---

net/Kconfig | 11 ++---------
1 files changed, 2 insertions(+), 9 deletions(-)

diff --git a/net/Kconfig b/net/Kconfig
index d6a9ce6..8fe8845 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -244,16 +244,9 @@ config NETPRIO_CGROUP
a per-interface basis

config NET_LL_RX_POLL
- bool "Low Latency Receive Poll"
+ boolean
depends on X86_TSC
- default n
- ---help---
- Support Low Latency Receive Queue Poll.
- (For network card drivers which support this option.)
- When waiting for data in read or poll call directly into the the device driver
- to flush packets which may be pending on the device queues into the stack.
-
- If unsure, say N.
+ default y

config BQL
boolean

2013-06-11 14:24:55

by Eliezer Tamir

[permalink] [raw]
Subject: [PATCH net-next 2/2] net:add socket option for low latency polling

adds a socket option for low latency polling.
This allows overriding the global sysctl value with a per-socket one.

Signed-off-by: Eliezer Tamir <[email protected]>
---

arch/alpha/include/uapi/asm/socket.h | 2 ++
arch/avr32/include/uapi/asm/socket.h | 2 ++
arch/cris/include/uapi/asm/socket.h | 2 ++
arch/frv/include/uapi/asm/socket.h | 2 ++
arch/h8300/include/uapi/asm/socket.h | 2 ++
arch/ia64/include/uapi/asm/socket.h | 2 ++
arch/m32r/include/uapi/asm/socket.h | 2 ++
arch/mips/include/uapi/asm/socket.h | 2 ++
arch/mn10300/include/uapi/asm/socket.h | 2 ++
arch/parisc/include/uapi/asm/socket.h | 2 ++
arch/powerpc/include/uapi/asm/socket.h | 2 ++
arch/s390/include/uapi/asm/socket.h | 2 ++
arch/sparc/include/uapi/asm/socket.h | 2 ++
arch/xtensa/include/uapi/asm/socket.h | 2 ++
include/net/ll_poll.h | 10 +++++-----
include/net/sock.h | 2 ++
include/uapi/asm-generic/socket.h | 2 ++
net/core/sock.c | 26 ++++++++++++++++++++++++++
18 files changed, 63 insertions(+), 5 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
index eee6ea7..4885825 100644
--- a/arch/alpha/include/uapi/asm/socket.h
+++ b/arch/alpha/include/uapi/asm/socket.h
@@ -81,4 +81,6 @@

#define SO_SELECT_ERR_QUEUE 45

+#define SO_LL 46
+
#endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/avr32/include/uapi/asm/socket.h b/arch/avr32/include/uapi/asm/socket.h
index 37401f5..79b6179 100644
--- a/arch/avr32/include/uapi/asm/socket.h
+++ b/arch/avr32/include/uapi/asm/socket.h
@@ -74,4 +74,6 @@

#define SO_SELECT_ERR_QUEUE 45

+#define SO_LL 46
+
#endif /* __ASM_AVR32_SOCKET_H */
diff --git a/arch/cris/include/uapi/asm/socket.h b/arch/cris/include/uapi/asm/socket.h
index ba409c9..47b1ec5 100644
--- a/arch/cris/include/uapi/asm/socket.h
+++ b/arch/cris/include/uapi/asm/socket.h
@@ -76,6 +76,8 @@

#define SO_SELECT_ERR_QUEUE 45

+#define SO_LL 46
+
#endif /* _ASM_SOCKET_H */


diff --git a/arch/frv/include/uapi/asm/socket.h b/arch/frv/include/uapi/asm/socket.h
index 31dbb5d..dbc0852 100644
--- a/arch/frv/include/uapi/asm/socket.h
+++ b/arch/frv/include/uapi/asm/socket.h
@@ -74,5 +74,7 @@

#define SO_SELECT_ERR_QUEUE 45

+#define SO_LL 46
+
#endif /* _ASM_SOCKET_H */

diff --git a/arch/h8300/include/uapi/asm/socket.h b/arch/h8300/include/uapi/asm/socket.h
index 5d1c6d0..a38d38a 100644
--- a/arch/h8300/include/uapi/asm/socket.h
+++ b/arch/h8300/include/uapi/asm/socket.h
@@ -74,4 +74,6 @@

#define SO_SELECT_ERR_QUEUE 45

+#define SO_LL 46
+
#endif /* _ASM_SOCKET_H */
diff --git a/arch/ia64/include/uapi/asm/socket.h b/arch/ia64/include/uapi/asm/socket.h
index 6b4329f..d3358b7 100644
--- a/arch/ia64/include/uapi/asm/socket.h
+++ b/arch/ia64/include/uapi/asm/socket.h
@@ -83,4 +83,6 @@

#define SO_SELECT_ERR_QUEUE 45

+#define SO_LL 46
+
#endif /* _ASM_IA64_SOCKET_H */
diff --git a/arch/m32r/include/uapi/asm/socket.h b/arch/m32r/include/uapi/asm/socket.h
index 2a3b59e..44aaf46 100644
--- a/arch/m32r/include/uapi/asm/socket.h
+++ b/arch/m32r/include/uapi/asm/socket.h
@@ -74,4 +74,6 @@

#define SO_SELECT_ERR_QUEUE 45

+#define SO_LL 46
+
#endif /* _ASM_M32R_SOCKET_H */
diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
index 3b21150..6a07992 100644
--- a/arch/mips/include/uapi/asm/socket.h
+++ b/arch/mips/include/uapi/asm/socket.h
@@ -92,4 +92,6 @@

#define SO_SELECT_ERR_QUEUE 45

+#define SO_LL 46
+
#endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/mn10300/include/uapi/asm/socket.h b/arch/mn10300/include/uapi/asm/socket.h
index b4ce844..db80fd3 100644
--- a/arch/mn10300/include/uapi/asm/socket.h
+++ b/arch/mn10300/include/uapi/asm/socket.h
@@ -74,4 +74,6 @@

#define SO_SELECT_ERR_QUEUE 45

+#define SO_LL 46
+
#endif /* _ASM_SOCKET_H */
diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
index 70c512a..f866fff 100644
--- a/arch/parisc/include/uapi/asm/socket.h
+++ b/arch/parisc/include/uapi/asm/socket.h
@@ -73,6 +73,8 @@

#define SO_SELECT_ERR_QUEUE 0x4026

+#define SO_LL 0x4027
+
/* O_NONBLOCK clashes with the bits used for socket types. Therefore we
* have to define SOCK_NONBLOCK to a different value here.
*/
diff --git a/arch/powerpc/include/uapi/asm/socket.h b/arch/powerpc/include/uapi/asm/socket.h
index a36daf3..405fb09 100644
--- a/arch/powerpc/include/uapi/asm/socket.h
+++ b/arch/powerpc/include/uapi/asm/socket.h
@@ -81,4 +81,6 @@

#define SO_SELECT_ERR_QUEUE 45

+#define SO_LL 46
+
#endif /* _ASM_POWERPC_SOCKET_H */
diff --git a/arch/s390/include/uapi/asm/socket.h b/arch/s390/include/uapi/asm/socket.h
index 2dacb306..0c5105fb 100644
--- a/arch/s390/include/uapi/asm/socket.h
+++ b/arch/s390/include/uapi/asm/socket.h
@@ -80,4 +80,6 @@

#define SO_SELECT_ERR_QUEUE 45

+#define SO_LL 46
+
#endif /* _ASM_SOCKET_H */
diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
index 89f49b6..b46c3fa 100644
--- a/arch/sparc/include/uapi/asm/socket.h
+++ b/arch/sparc/include/uapi/asm/socket.h
@@ -70,6 +70,8 @@

#define SO_SELECT_ERR_QUEUE 0x0029

+#define SO_LL 0x0030
+
/* Security levels - as per NRL IPv6 - don't actually do anything */
#define SO_SECURITY_AUTHENTICATION 0x5001
#define SO_SECURITY_ENCRYPTION_TRANSPORT 0x5002
diff --git a/arch/xtensa/include/uapi/asm/socket.h b/arch/xtensa/include/uapi/asm/socket.h
index a8f44f5..b21ace4 100644
--- a/arch/xtensa/include/uapi/asm/socket.h
+++ b/arch/xtensa/include/uapi/asm/socket.h
@@ -85,4 +85,6 @@

#define SO_SELECT_ERR_QUEUE 45

+#define SO_LL 46
+
#endif /* _XTENSA_SOCKET_H */
diff --git a/include/net/ll_poll.h b/include/net/ll_poll.h
index bc262f8..4dafd52 100644
--- a/include/net/ll_poll.h
+++ b/include/net/ll_poll.h
@@ -43,14 +43,14 @@ extern unsigned long sysctl_net_ll_poll __read_mostly;
/* we don't mind a ~2.5% imprecision */
#define TSC_MHZ (tsc_khz >> 10)

-static inline cycles_t ll_end_time(void)
+static inline cycles_t ll_end_time(struct sock *sk)
{
- return TSC_MHZ * ACCESS_ONCE(sysctl_net_ll_poll) + get_cycles();
+ return TSC_MHZ * ACCESS_ONCE(sk->sk_ll_usec) + get_cycles();
}

static inline bool sk_valid_ll(struct sock *sk)
{
- return sysctl_net_ll_poll && sk->sk_napi_id &&
+ return sk->sk_ll_usec && sk->sk_napi_id &&
!need_resched() && !signal_pending(current);
}

@@ -62,7 +62,7 @@ static inline bool can_poll_ll(cycles_t end_time)

static inline bool sk_poll_ll(struct sock *sk, int nonblock)
{
- cycles_t end_time = ll_end_time();
+ cycles_t end_time = ll_end_time(sk);
const struct net_device_ops *ops;
struct napi_struct *napi;
int rc = false;
@@ -116,7 +116,7 @@ static inline void sk_mark_ll(struct sock *sk, struct sk_buff *skb)

#else /* CONFIG_NET_LL_RX_POLL */

-static inline cycles_t ll_end_time(void)
+static inline cycles_t ll_end_time(struct sock *sk)
{
return 0;
}
diff --git a/include/net/sock.h b/include/net/sock.h
index ac8e181..21db792 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -230,6 +230,7 @@ struct cg_proto;
* @sk_wmem_queued: persistent queue size
* @sk_forward_alloc: space allocated forward
* @sk_napi_id: id of the last napi context to receive data for sk
+ * @sk_ll_usec: usecs to busypoll when there is no data
* @sk_allocation: allocation mode
* @sk_sndbuf: size of send buffer in bytes
* @sk_flags: %SO_LINGER (l_onoff), %SO_BROADCAST, %SO_KEEPALIVE,
@@ -328,6 +329,7 @@ struct sock {
#endif
#ifdef CONFIG_NET_LL_RX_POLL
unsigned int sk_napi_id;
+ unsigned int sk_ll_usec;
#endif
atomic_t sk_drops;
int sk_rcvbuf;
diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
index c5d2e3a..ca3a20d 100644
--- a/include/uapi/asm-generic/socket.h
+++ b/include/uapi/asm-generic/socket.h
@@ -76,4 +76,6 @@

#define SO_SELECT_ERR_QUEUE 45

+#define SO_LL 46
+
#endif /* __ASM_GENERIC_SOCKET_H */
diff --git a/net/core/sock.c b/net/core/sock.c
index 788c0da..bddc962 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -913,6 +913,23 @@ set_rcvbuf:
sock_valbool_flag(sk, SOCK_SELECT_ERR_QUEUE, valbool);
break;

+#ifdef CONFIG_NET_LL_RX_POLL
+ case SO_LL:
+ if (!capable(CAP_NET_ADMIN))
+ ret = -EACCES;
+ else {
+ unsigned long ulval;
+
+ if (!copy_from_user(&ulval, optval, sizeof(ulval))
+ && ulval >= 0)
+ sk->sk_ll_usec = ulval;
+
+ else
+ ret = -EINVAL;
+
+ }
+ break;
+#endif
default:
ret = -ENOPROTOOPT;
break;
@@ -944,6 +961,7 @@ int sock_getsockopt(struct socket *sock, int level, int optname,

union {
int val;
+ unsigned long ulval;
struct linger ling;
struct timeval tm;
} v;
@@ -1170,6 +1188,13 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
v.val = sock_flag(sk, SOCK_SELECT_ERR_QUEUE);
break;

+#ifdef CONFIG_NET_LL_RX_POLL
+ case SO_LL:
+ len = sizeof(v.ulval);
+ v.ulval = sk->sk_ll_usec;
+ break;
+#endif
+
default:
return -ENOPROTOOPT;
}
@@ -2288,6 +2313,7 @@ void sock_init_data(struct socket *sock, struct sock *sk)

#ifdef CONFIG_NET_LL_RX_POLL
sk->sk_napi_id = 0;
+ sk->sk_ll_usec = sysctl_net_ll_poll;
#endif

/*

2013-06-11 14:26:37

by Eliezer Tamir

[permalink] [raw]
Subject: [PATCH] sockperf: add SO_LL socketop support

Add lls socket option support to sockperf.
Right now we always get the option before set to show the option is
working properly. We should probably remove that in an official release.
use --lls (value in usecs) to override global setting.
---

src/Defs.h | 3 +++
src/SockPerf.cpp | 54
+++++++++++++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 56 insertions(+), 1 deletions(-)

diff --git a/src/Defs.h b/src/Defs.h
index e38e3a4..87b45a0 100644
--- a/src/Defs.h
+++ b/src/Defs.h
@@ -161,6 +161,7 @@ enum {
OPT_OUTPUT_PRECISION, //35
OPT_CLIENTPORT, //36
OPT_CLIENTIP, //37
+ OPT_LLS, //38
};

#define MODULE_NAME "sockperf"
@@ -527,6 +528,8 @@ struct user_params_t {
// bool stream_mode; - use b_stream instead
int mthread_server;
struct timeval* select_timeout;
+ unsigned long lls_usecs;
+ bool lls_is_set;
int sock_buff_size;
int threads_num;
char threads_affinity[MAX_ARGV_SIZE];
diff --git a/src/SockPerf.cpp b/src/SockPerf.cpp
index 41daf95..d76320f 100644
--- a/src/SockPerf.cpp
+++ b/src/SockPerf.cpp
@@ -207,6 +207,10 @@ static const AOPT_DESC common_opt_desc[] =
"Limit the lifetime of the message (default 2)."
},
{
+ OPT_LLS, AOPT_ARG, aopt_set_literal( 0 ), aopt_set_string( "lls" ),
+ "Turn on LLS via socket option (value = us to poll)."
+ },
+ {
OPT_BUFFER_SIZE, AOPT_ARG, aopt_set_literal( 0 ),
aopt_set_string( "buffer-size" ),
"Set total socket receive/send buffer <size> in bytes (system
defined by default)."
},
@@ -292,7 +296,7 @@ static int proc_mode_help( int id, int argc, const
char **argv )
int i = 0;

printf(MODULE_NAME " is a tool for testing network latency and
throughput.\n");
- printf("version %s\n", STR(VERSION));
+ printf("version %s-lls\n", STR(VERSION));
printf("\n");
printf("Usage: " MODULE_NAME " <subcommand> [options] [args]\n");
printf("Type: \'" MODULE_NAME " <subcommand> --help\' for help on
a specific subcommand.\n");
@@ -1789,6 +1793,26 @@ static int parse_common_opt( const AOPT_OBJECT
*common_obj )
s_user_params.is_nonblocked_send = true;
}

+ if ( !rc && aopt_check(common_obj, OPT_LLS) ) {
+ const char* optarg = aopt_value(common_obj, OPT_LLS);
+ if (optarg) {
+ errno = 0;
+ int value = strtoul(optarg, NULL, 0);
+ if (errno != 0 || value < 0) {
+ log_msg("'-%d' Invalid LLS value: %s", OPT_LLS,
optarg);
+ rc = SOCKPERF_ERR_BAD_ARGUMENT;
+ }
+ else {
+ s_user_params.lls_usecs = value;
+ s_user_params.lls_is_set = true;
+ }
+ }
+ else {
+ log_msg("'-%d' Invalid value", OPT_LLS);
+ rc = SOCKPERF_ERR_BAD_ARGUMENT;
+ }
+ }
+
if ( !rc && aopt_check(common_obj, OPT_RECV_LOOPING) ) {

const char* optarg = aopt_value(common_obj, OPT_RECV_LOOPING);
@@ -2296,6 +2320,29 @@ int sock_set_reuseaddr(int fd)
return rc;
}

+#ifndef SO_LL
+#define SO_LL 46
+#endif
+int sock_set_lls(int fd)
+{
+ int rc = SOCKPERF_ERR_NONE;
+ unsigned long lls;
+ int size = sizeof(lls);
+
+ if(getsockopt(fd, SOL_SOCKET, SO_LL, &lls, (socklen_t *)&size) < 0){
+ log_err("getsockopt(SO_LL) failed");
+ return SOCKPERF_ERR_SOCKET;
+ } else
+ log_msg("socket option SO_LL default was %lu, changing to %lu",
lls, s_user_params.lls_usecs);
+
+ if (setsockopt(fd, SOL_SOCKET, SO_LL, &(s_user_params.lls_usecs),
sizeof(s_user_params.lls_usecs)) < 0) {
+ log_err("setsockopt(SO_LL) failed");
+ rc = SOCKPERF_ERR_SOCKET;
+ }
+ return rc;
+}
+
+
int sock_set_snd_rcv_bufs(int fd)
{
/*
@@ -2460,6 +2507,11 @@ int prepare_socket(int fd, struct fds_data *p_data)
}

if (!rc &&
+ (s_user_params.lls_is_set == true))
+ {
+ rc = sock_set_lls(fd);
+ }
+ if (!rc &&
(s_user_params.sock_buff_size > 0))
{
rc = sock_set_snd_rcv_bufs(fd);




2013-06-11 14:45:38

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH net-next 2/2] net:add socket option for low latency polling

On Tue, 2013-06-11 at 17:24 +0300, Eliezer Tamir wrote:
> adds a socket option for low latency polling.
> This allows overriding the global sysctl value with a per-socket one.
>
> Signed-off-by: Eliezer Tamir <[email protected]>
> ---
>
>
> -static inline cycles_t ll_end_time(void)
> +static inline cycles_t ll_end_time(struct sock *sk)
> {
> - return TSC_MHZ * ACCESS_ONCE(sysctl_net_ll_poll) + get_cycles();
> + return TSC_MHZ * ACCESS_ONCE(sk->sk_ll_usec) + get_cycles();
> }

Hmm, sk_ll_usec is an unsigned int.

(cycles_t)TSC_MHZ * ACCESS_ONCE(sk->sk_ll_usec) + get_cycles();

>
> +#ifdef CONFIG_NET_LL_RX_POLL
> + case SO_LL:
> + if (!capable(CAP_NET_ADMIN))
> + ret = -EACCES;
> + else {
> + unsigned long ulval;

Why an "unsigned long" ?


> +
> + if (!copy_from_user(&ulval, optval, sizeof(ulval))
> + && ulval >= 0)
> + sk->sk_ll_usec = ulval;
> +
> + else
> + ret = -EINVAL;

Just use
sk->sk_ll_usec = val;

> +
> + }
> + break;
> +#endif
> default:
> ret = -ENOPROTOOPT;
> break;
> @@ -944,6 +961,7 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
>
> union {
> int val;
> + unsigned long ulval;

Why an unsigned long ? 32bit are more than enough.

> struct linger ling;
> struct timeval tm;
> } v;
> @@ -1170,6 +1188,13 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
> v.val = sock_flag(sk, SOCK_SELECT_ERR_QUEUE);
> break;
>
> +#ifdef CONFIG_NET_LL_RX_POLL
> + case SO_LL:
> + len = sizeof(v.ulval);
> + v.ulval = sk->sk_ll_usec;
> + break;
> +#endif
> +


2013-06-11 15:37:38

by Eliezer Tamir

[permalink] [raw]
Subject: Re: [PATCH net-next 2/2] net:add socket option for low latency polling

On 11/06/2013 17:45, Eric Dumazet wrote:
> On Tue, 2013-06-11 at 17:24 +0300, Eliezer Tamir wrote:
>> adds a socket option for low latency polling.
>> This allows overriding the global sysctl value with a per-socket one.
>>
>> Signed-off-by: Eliezer Tamir <[email protected]>
>> ---
>>
>>
>> -static inline cycles_t ll_end_time(void)
>> +static inline cycles_t ll_end_time(struct sock *sk)
>> {
>> - return TSC_MHZ * ACCESS_ONCE(sysctl_net_ll_poll) + get_cycles();
>> + return TSC_MHZ * ACCESS_ONCE(sk->sk_ll_usec) + get_cycles();
>> }
>
> Hmm, sk_ll_usec is an unsigned int.

We changed it to an unsigned long in v7, I guess that was gratuitous.
Re-reading your comments on v6 2/5 I realize a cast would have sufficed.
Should I send a patch to revert it to an unsigned int?

Thanks,
Eliezer

2013-06-11 15:59:10

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH net-next 2/2] net:add socket option for low latency polling

On Tue, 2013-06-11 at 18:37 +0300, Eliezer Tamir wrote:
> On 11/06/2013 17:45, Eric Dumazet wrote:
> > On Tue, 2013-06-11 at 17:24 +0300, Eliezer Tamir wrote:
> >> adds a socket option for low latency polling.
> >> This allows overriding the global sysctl value with a per-socket one.
> >>
> >> Signed-off-by: Eliezer Tamir <[email protected]>
> >> ---
> >>
> >>
> >> -static inline cycles_t ll_end_time(void)
> >> +static inline cycles_t ll_end_time(struct sock *sk)
> >> {
> >> - return TSC_MHZ * ACCESS_ONCE(sysctl_net_ll_poll) + get_cycles();
> >> + return TSC_MHZ * ACCESS_ONCE(sk->sk_ll_usec) + get_cycles();
> >> }
> >
> > Hmm, sk_ll_usec is an unsigned int.
>
> We changed it to an unsigned long in v7, I guess that was gratuitous.
> Re-reading your comments on v6 2/5 I realize a cast would have sufficed.
> Should I send a patch to revert it to an unsigned int?

One sysctl as unsigned long was not a big deal so I was ok with your
change ;)

unsigned int for sk_ll_usec is enough, but using a 32x32->64bit multiply
is probably safer.



2013-06-11 20:24:42

by Ben Hutchings

[permalink] [raw]
Subject: Re: [PATCH net-next 2/2] net:add socket option for low latency polling

On Tue, 2013-06-11 at 17:24 +0300, Eliezer Tamir wrote:
> adds a socket option for low latency polling.
> This allows overriding the global sysctl value with a per-socket one.
[...]
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -913,6 +913,23 @@ set_rcvbuf:
> sock_valbool_flag(sk, SOCK_SELECT_ERR_QUEUE, valbool);
> break;
>
> +#ifdef CONFIG_NET_LL_RX_POLL
> + case SO_LL:
> + if (!capable(CAP_NET_ADMIN))
> + ret = -EACCES;
[...]

Failed capability checks normally result in EPERM whereas EACCES usually
results from a file permission/ACL/label check.

Perhaps unprivileged users should be allowed to set a value as long as
it's less than or equal to the global value?

Ben.

--
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

2013-06-12 06:39:19

by Eliezer Tamir

[permalink] [raw]
Subject: Re: [PATCH net-next 2/2] net:add socket option for low latency polling

On 11/06/2013 23:24, Ben Hutchings wrote:
> On Tue, 2013-06-11 at 17:24 +0300, Eliezer Tamir wrote:
>> adds a socket option for low latency polling.
>> This allows overriding the global sysctl value with a per-socket one.
> [...]
>> --- a/net/core/sock.c
>> +++ b/net/core/sock.c
>> @@ -913,6 +913,23 @@ set_rcvbuf:
>> sock_valbool_flag(sk, SOCK_SELECT_ERR_QUEUE, valbool);
>> break;
>>
>> +#ifdef CONFIG_NET_LL_RX_POLL
>> + case SO_LL:
>> + if (!capable(CAP_NET_ADMIN))
>> + ret = -EACCES;
> [...]
>
> Failed capability checks normally result in EPERM whereas EACCES usually
> results from a file permission/ACL/label check.

OK

> Perhaps unprivileged users should be allowed to set a value as long as
> it's less than or equal to the global value?

I thought of allowing to disable even if you are not privileged, but
this sounds better.

Thanks!

2013-06-12 08:38:58

by Amir Vadai

[permalink] [raw]
Subject: Re: [PATCH] sockperf: add SO_LL socketop support

On 11/06/2013 17:26, Eliezer Tamir wrote:
> Add lls socket option support to sockperf.
> Right now we always get the option before set to show the option is
> working properly. We should probably remove that in an official release.
> use --lls (value in usecs) to override global setting.
> ---
>
> src/Defs.h | 3 +++
> src/SockPerf.cpp | 54
> +++++++++++++++++++++++++++++++++++++++++++++++++++++-
> 2 files changed, 56 insertions(+), 1 deletions(-)
>
> diff --git a/src/Defs.h b/src/Defs.h
> index e38e3a4..87b45a0 100644
> --- a/src/Defs.h
> +++ b/src/Defs.h
> @@ -161,6 +161,7 @@ enum {
> OPT_OUTPUT_PRECISION, //35
> OPT_CLIENTPORT, //36
> OPT_CLIENTIP, //37
> + OPT_LLS, //38
> };
>
> #define MODULE_NAME "sockperf"
> @@ -527,6 +528,8 @@ struct user_params_t {
> // bool stream_mode; - use b_stream instead
> int mthread_server;
> struct timeval* select_timeout;
> + unsigned long lls_usecs;
> + bool lls_is_set;
> int sock_buff_size;
> int threads_num;
> char threads_affinity[MAX_ARGV_SIZE];
> diff --git a/src/SockPerf.cpp b/src/SockPerf.cpp
> index 41daf95..d76320f 100644
> --- a/src/SockPerf.cpp
> +++ b/src/SockPerf.cpp
> @@ -207,6 +207,10 @@ static const AOPT_DESC common_opt_desc[] =
> "Limit the lifetime of the message (default 2)."
> },
> {
> + OPT_LLS, AOPT_ARG, aopt_set_literal( 0 ), aopt_set_string(
> "lls" ),
> + "Turn on LLS via socket option (value = us to poll)."
> + },
> + {
> OPT_BUFFER_SIZE, AOPT_ARG, aopt_set_literal( 0 ),
> aopt_set_string( "buffer-size" ),
> "Set total socket receive/send buffer <size> in bytes (system
> defined by default)."
> },
> @@ -292,7 +296,7 @@ static int proc_mode_help( int id, int argc, const
> char **argv )
> int i = 0;
>
> printf(MODULE_NAME " is a tool for testing network latency and
> throughput.\n");
> - printf("version %s\n", STR(VERSION));
> + printf("version %s-lls\n", STR(VERSION));
> printf("\n");
> printf("Usage: " MODULE_NAME " <subcommand> [options] [args]\n");
> printf("Type: \'" MODULE_NAME " <subcommand> --help\' for help on a
> specific subcommand.\n");
> @@ -1789,6 +1793,26 @@ static int parse_common_opt( const AOPT_OBJECT
> *common_obj )
> s_user_params.is_nonblocked_send = true;
> }
>
> + if ( !rc && aopt_check(common_obj, OPT_LLS) ) {
> + const char* optarg = aopt_value(common_obj, OPT_LLS);
> + if (optarg) {
> + errno = 0;
> + int value = strtoul(optarg, NULL, 0);
> + if (errno != 0 || value < 0) {
> + log_msg("'-%d' Invalid LLS value: %s", OPT_LLS,
> optarg);
> + rc = SOCKPERF_ERR_BAD_ARGUMENT;
> + }
> + else {
> + s_user_params.lls_usecs = value;
> + s_user_params.lls_is_set = true;
> + }
> + }
> + else {
> + log_msg("'-%d' Invalid value", OPT_LLS);
> + rc = SOCKPERF_ERR_BAD_ARGUMENT;
> + }
> + }
> +
> if ( !rc && aopt_check(common_obj, OPT_RECV_LOOPING) ) {
>
> const char* optarg = aopt_value(common_obj, OPT_RECV_LOOPING);
> @@ -2296,6 +2320,29 @@ int sock_set_reuseaddr(int fd)
> return rc;
> }
>
> +#ifndef SO_LL
> +#define SO_LL 46
> +#endif
> +int sock_set_lls(int fd)
> +{
> + int rc = SOCKPERF_ERR_NONE;
> + unsigned long lls;
> + int size = sizeof(lls);
> +
> + if(getsockopt(fd, SOL_SOCKET, SO_LL, &lls, (socklen_t *)&size) < 0){
> + log_err("getsockopt(SO_LL) failed");
> + return SOCKPERF_ERR_SOCKET;
> + } else
> + log_msg("socket option SO_LL default was %lu, changing to %lu",
> lls, s_user_params.lls_usecs);
> +
> + if (setsockopt(fd, SOL_SOCKET, SO_LL, &(s_user_params.lls_usecs),
> sizeof(s_user_params.lls_usecs)) < 0) {
> + log_err("setsockopt(SO_LL) failed");
> + rc = SOCKPERF_ERR_SOCKET;
> + }
> + return rc;
> +}
> +
> +
> int sock_set_snd_rcv_bufs(int fd)
> {
> /*
> @@ -2460,6 +2507,11 @@ int prepare_socket(int fd, struct fds_data *p_data)
> }
>
> if (!rc &&
> + (s_user_params.lls_is_set == true))
> + {
> + rc = sock_set_lls(fd);
> + }
> + if (!rc &&
> (s_user_params.sock_buff_size > 0))
> {
> rc = sock_set_snd_rcv_bufs(fd);
>
>
>
>
>

Eliezer Hi,

Added sockperf maintainers to CC list.

To upload changes to sockperf code, please open a ticket on the sockperf
google code and then submit the patch
(https://code.google.com/p/sockperf/issues/).

You will need to wrap it in " #ifndef w-i-n-d-o-w-s" but sockperf
maintainers can help you do that on the sockperf pages.

You should look at the recently added TOS which is very similar
(https://code.google.com/p/sockperf/issues/detail?id=44).

Amir

2013-06-12 08:45:32

by Eliezer Tamir

[permalink] [raw]
Subject: Re: [PATCH] sockperf: add SO_LL socketop support

On 12/06/2013 11:36, Amir Vadai wrote:
> On 11/06/2013 17:26, Eliezer Tamir wrote:
>> Add lls socket option support to sockperf.
>> Right now we always get the option before set to show the option is
>> working properly. We should probably remove that in an official release.
>> use --lls (value in usecs) to override global setting.
>> ---

> Eliezer Hi,
>
> Added sockperf maintainers to CC list.

Thank you.

Please note that I plan on revising this patch, so don't add it just
yet. (I will CC the maintainers on further versions.)

-Eliezer

2013-06-12 22:12:16

by David Miller

[permalink] [raw]
Subject: Re: [PATCH net-next 1/2] net: remove NET_LL_RX_POLL config menue

From: Eliezer Tamir <[email protected]>
Date: Tue, 11 Jun 2013 17:24:28 +0300

> depends on X86_TSC

Wait a second, I didn't notice this before. There needs to be a better
way to test for the accuracy you need, or if the issue is lack of a proper
API for cycle counter reading, fix that rather than add ugly arch
specific dependencies to generic networking code.

2013-06-13 02:01:10

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [PATCH net-next 1/2] net: remove NET_LL_RX_POLL config menue

On Wed, 12 Jun 2013 15:12:05 -0700 (PDT)
David Miller <[email protected]> wrote:

> From: Eliezer Tamir <[email protected]>
> Date: Tue, 11 Jun 2013 17:24:28 +0300
>
> > depends on X86_TSC
>
> Wait a second, I didn't notice this before. There needs to be a better
> way to test for the accuracy you need, or if the issue is lack of a proper
> API for cycle counter reading, fix that rather than add ugly arch
> specific dependencies to generic networking code.

This should be sched_clock(), rather than direct TSC access.
Also any code using TSC or sched_clock has to be carefully audited to deal with
clocks running at different rates on different CPU's. Basically value is only
meaning full on same CPU.

2013-06-13 02:13:41

by Eliezer Tamir

[permalink] [raw]
Subject: Re: [PATCH net-next 1/2] net: remove NET_LL_RX_POLL config menue

On 13/06/2013 05:01, Stephen Hemminger wrote:
> On Wed, 12 Jun 2013 15:12:05 -0700 (PDT)
> David Miller <[email protected]> wrote:
>
>> From: Eliezer Tamir <[email protected]>
>> Date: Tue, 11 Jun 2013 17:24:28 +0300
>>
>>> depends on X86_TSC
>>
>> Wait a second, I didn't notice this before. There needs to be a better
>> way to test for the accuracy you need, or if the issue is lack of a proper
>> API for cycle counter reading, fix that rather than add ugly arch
>> specific dependencies to generic networking code.
>
> This should be sched_clock(), rather than direct TSC access.
> Also any code using TSC or sched_clock has to be carefully audited to deal with
> clocks running at different rates on different CPU's. Basically value is only
> meaning full on same CPU.

OK,

If we covert to sched_clock(), would adding a define such as
HAVE_HIGH_PRECISION_CLOCK to architectures that have both a high
precision clock and a 64 bit cycles_t be a good solution?

(if not any other suggestion?)

2013-06-13 08:01:22

by Daniel Borkmann

[permalink] [raw]
Subject: Re: [PATCH net-next 1/2] net: remove NET_LL_RX_POLL config menue

On 06/13/2013 04:13 AM, Eliezer Tamir wrote:
> On 13/06/2013 05:01, Stephen Hemminger wrote:
>> On Wed, 12 Jun 2013 15:12:05 -0700 (PDT)
>> David Miller <[email protected]> wrote:
>>
>>> From: Eliezer Tamir <[email protected]>
>>> Date: Tue, 11 Jun 2013 17:24:28 +0300
>>>
>>>> depends on X86_TSC
>>>
>>> Wait a second, I didn't notice this before. There needs to be a better
>>> way to test for the accuracy you need, or if the issue is lack of a proper
>>> API for cycle counter reading, fix that rather than add ugly arch
>>> specific dependencies to generic networking code.
>>
>> This should be sched_clock(), rather than direct TSC access.
>> Also any code using TSC or sched_clock has to be carefully audited to deal with
>> clocks running at different rates on different CPU's. Basically value is only
>> meaning full on same CPU.
>
> OK,
>
> If we covert to sched_clock(), would adding a define such as HAVE_HIGH_PRECISION_CLOCK to architectures that have both a high precision clock and a 64 bit cycles_t be a good solution?
>
> (if not any other suggestion?)

Hm, probably cpu_clock() and similar might be better, since they use
sched_clock() in the background when !CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
(meaning when sched_clock() provides synchronized highres time source from
the architecture), and, quoting ....

Otherwise it tries to create a semi stable clock from a mixture of other
clocks, including:

- GTOD (clock monotomic)
- sched_clock()
- explicit idle events

But yeah, it needs to be evaluated regarding the drift between CPUs in
general.

Then, eventually, you could get rid of the entire NET_LL_RX_POLL config
option plus related ifdefs in the code and have it built-in in general?

2013-06-13 10:09:45

by Eliezer Tamir

[permalink] [raw]
Subject: Re: [PATCH net-next 1/2] net: remove NET_LL_RX_POLL config menue

On 13/06/2013 11:00, Daniel Borkmann wrote:
> On 06/13/2013 04:13 AM, Eliezer Tamir wrote:
>> On 13/06/2013 05:01, Stephen Hemminger wrote:
>>> On Wed, 12 Jun 2013 15:12:05 -0700 (PDT)
>>> David Miller <[email protected]> wrote:
>>>
>>>> From: Eliezer Tamir <[email protected]>
>>>> Date: Tue, 11 Jun 2013 17:24:28 +0300
>>>>
>>>>> depends on X86_TSC
>>>>
>>>> Wait a second, I didn't notice this before. There needs to be a better
>>>> way to test for the accuracy you need, or if the issue is lack of a
>>>> proper
>>>> API for cycle counter reading, fix that rather than add ugly arch
>>>> specific dependencies to generic networking code.
>>>
>>> This should be sched_clock(), rather than direct TSC access.
>>> Also any code using TSC or sched_clock has to be carefully audited to
>>> deal with
>>> clocks running at different rates on different CPU's. Basically value
>>> is only
>>> meaning full on same CPU.
>>
>> OK,
>>
>> If we covert to sched_clock(), would adding a define such as
>> HAVE_HIGH_PRECISION_CLOCK to architectures that have both a high
>> precision clock and a 64 bit cycles_t be a good solution?
>>
>> (if not any other suggestion?)
>
> Hm, probably cpu_clock() and similar might be better, since they use
> sched_clock() in the background when !CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
> (meaning when sched_clock() provides synchronized highres time source from
> the architecture), and, quoting ....

I don't think we want the overhead of disabling IRQs
that cpu_clock() adds.

We don't really care about precise measurement.
All we need is a sane cut-off for busy polling.
It's no big deal if on a rare occasion we poll less,
or even poll twice the time.
As long as it's rare it should not matter.

Maybe the answer is not to use cycle counting at all?
Maybe just wait the full sk_rcvtimo?
(resched() when proper, bail out if signal pending, etc.)

This could only be a safe/sane thing to do after we add
a socket option, because this can't be a global setting.

This would of course turn the option into a flag.
If it's set (and !nonblock), busy wait up to sk_recvtimo.

Opinions?